检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

11,885 篇 会议
5 篇 期刊文献

馆藏范围

11,890 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

8,059 篇 工学
- 7,617 篇 计算机科学与技术...
- 796 篇 机械工程
- 688 篇 电气工程
- 360 篇 软件工程
- 228 篇 控制科学与工程
- 40 篇 光学工程
- 19 篇 生物工程
- 17 篇 信息与通信工程
- 12 篇 生物医学工程（可授...
- 6 篇 电子科学与技术（可...
- 6 篇 建筑学
- 6 篇 交通运输工程
- 5 篇 仪器科学与技术
- 5 篇 化学工程与技术
- 5 篇 安全科学与工程
- 4 篇 土木工程
3,347 篇 医学
- 3,346 篇 临床医学
- 4 篇 基础医学(可授医学...
- 4 篇 公共卫生与预防医...
253 篇 理学
- 198 篇 系统科学
- 32 篇 物理学
- 21 篇 生物学
- 18 篇 数学
- 9 篇 统计学（可授理学、...
- 7 篇 化学
17 篇 管理学
- 12 篇 管理科学与工程(可...
- 7 篇 图书情报与档案管...
- 5 篇 工商管理
3 篇 法学
- 3 篇 社会学
3 篇 教育学
- 3 篇 教育学
2 篇 农学
1 篇 经济学
1 篇 军事学

主题

5,633 篇 computer vision
2,668 篇 training
2,203 篇 pattern recognit...
1,747 篇 computational mo...
1,502 篇 visualization
1,360 篇 three-dimensiona...
1,074 篇 semantics
999 篇 benchmark testin...
986 篇 codes
959 篇 computer archite...
891 篇 deep learning
777 篇 conferences
754 篇 task analysis
700 篇 feature extracti...
561 篇 transformers
533 篇 face recognition
527 篇 neural networks
495 篇 object detection
490 篇 image segmentati...
468 篇 cameras

机构

174 篇 univ sci & techn...
145 篇 carnegie mellon ...
144 篇 univ chinese aca...
144 篇 tsinghua univ pe...
134 篇 chinese univ hon...
110 篇 zhejiang univ pe...
109 篇 peng cheng lab p...
99 篇 swiss fed inst t...
91 篇 tsinghua univers...
90 篇 shanghai ai lab ...
87 篇 sensetime res pe...
86 篇 shanghai jiao to...
83 篇 zhejiang univers...
82 篇 tech univ munich...
79 篇 university of sc...
79 篇 stanford univ st...
78 篇 univ hong kong p...
77 篇 australian natl ...
76 篇 alibaba grp peop...
75 篇 peng cheng labor...

作者

75 篇 timofte radu
64 篇 van gool luc
50 篇 zhang lei
43 篇 yang yi
37 篇 loy chen change
36 篇 tao dacheng
32 篇 zhou jie
31 篇 chen chen
30 篇 liu yang
30 篇 tian qi
29 篇 sun jian
29 篇 zha zheng-jun
28 篇 li xin
27 篇 qi tian
26 篇 vasconcelos nuno
25 篇 liu xiaoming
25 篇 darrell trevor
24 篇 zheng wei-shi
24 篇 luo ping
24 篇 ying shan

语言

11,863 篇 英文
26 篇 其他
1 篇 中文

检索条件"任意字段=2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024"

共 11890 条记录，以下是331-340 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

MAPLM: A Real-World Large-Scale vision-Language Benchmark for Map and Traffic Scene Understanding

MAPLM: A Real-World Large-Scale Vision-Language Benchmark fo...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Cao, Xu Zhou, Tong Ma, Yunsheng Ye, Wenqian Cui, Can Tang, Kun Cao, Zhipeng Liang, Kaizhao Wang, Ziran Rehg, James M. Zheng, Chao Tencent T Lab Palo Alto CA 94306 USA Univ Illinois Champaign IL USA Purdue Univ W Lafayette IN USA Univ Virginia Charlottesville VA USA SambaNova Syst Inc Palo Alto CA USA

ISBN: (纸本)9798350353006

vision-language generative AI has demonstrated remarkable promise for empowering cross-modal scene understanding of autonomous driving and high-definition (HD) map systems. However, current benchmark datasets lack multi-modal point cloud, image, and language data pairs. Recent approaches utilize visual instruction learning and cross-modal prompt engineering to expand vision-language models into this domain. In this paper, we propose a new vision-language benchmark that can be used to finetune traffic and HD map domain-specific foundation models. Specifically, we annotate and leverage large-scale, broad-coverage traffic and map data extracted from huge HD map annotations, and use CLIP and LLaMA-2 / Vicuna to finetune a baseline model with instruction-following data. Our experimental results across various algorithms reveal that while visual instruction-tuning large language models (LLMs) can effectively learn meaningful representations from MAPLM-QA, there remains significant room for further advancements. To facilitate applying LLMs and multi-modal data into self-driving research, we will release our visual-language QA data, and the baseline models at ***/LLVM-AD/MAPLM.

关键词： High-definition (HD) Map Large Language Model Multimodal Learning vision-Language Model Visual Question Answering

来源：评论

学校读者我要写书评

暂无评论

DRESS: Instructing Large vision-Language Models to Align and Interact with Humans via Natural Language Feedback

DRESS: Instructing Large Vision-Language Models to Align and...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Chen, Yangyi Sikka, Karan Cogswell, Michael Ji, Heng Divakaran, Ajay SRI Int Menlo Pk CA 94025 USA Univ Illinois Champaign IL 61820 USA

ISBN: (纸本)9798350353006

We present DRESS, a large vision language model (LVLM) that innovatively exploits Natural Language feedback (NLF) from Large Language Models to enhance its alignment and interactions by addressing two key limitations in the state-of-the-art LVLMs. First, prior LVLMs generally rely only on the instruction finetuning stage to enhance alignment with human preferences. Without incorporating extra feedback, they are still prone to generate unhelpful, hallucinated, or harmful responses. Second, while the visual instruction tuning data is generally structured in a multi-turn dialogue format, the connections and dependencies among consecutive conversational turns are weak. This reduces the capacity for effective multi-turn interactions. To tackle these, we propose a novel categorization of the NLF into two key types: critique and refinement. The critique NLF identifies the strengths and weaknesses of the responses and is used to align the LVLMs with human preferences. The refinement NLF offers concrete suggestions for improvement and is adopted to improve the interaction ability of the LVLMs- which focuses on LVLMs' ability to refine responses by incorporating feedback in multi-turn interactions. To address the non-differentiable nature of NLF, we generalize conditional reinforcement learning for training. Our experimental results demonstrate that DRESS can generate more helpful (9.76%), honest (11.52%), and harmless (21.03%) responses, and more effectively learn from feedback during multi-turn interactions compared to SOTA LVLMs.

关键词： Alignment Interaction Large vision Language Models Natural Language Feedback

来源：评论

学校读者我要写书评

暂无评论

TIM: A Time Interval Machine for Audio-Visual Action recognition

TIM: A Time Interval Machine for Audio-Visual Action Recogni...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Chalk, Jacob Huh, Jaesung Kazakos, Evangelos Zisserman, Andrew Damen, Dima Univ Bristol Bristol Avon England Univ Oxford VGG Oxford England Czech Tech Univ Prague Czech Republic

ISBN: (纸本)9798350353006

Diverse actions give rise to rich audio-visual signals in long videos. Recent works showcase that the two modalities of audio and video exhibit different temporal extents of events and distinct labels. We address the interplay between the two modalities in long videos by explicitly modelling the temporal extents of audio and visual events. We propose the Time Interval Machine (TIM) where a modality-specific time interval poses as a query to a transformer encoder that ingests a long video input. The encoder then attends to the specified interval, as well as the surrounding context in both modalities, in order to recognise the ongoing action. We test TIM on three long audio-visual video datasets: EPIC-KITCHENS, Perception Test, and AVE, reporting state-of-the-art (SOTA) for recognition. On EPIC-KITCHENS, we beat previous SOTA that utilises LLMs and significantly larger pre- training by 2.9% top-1 action recognition accuracy. Additionally, we show that TIM can be adapted for action detection, using dense multi-scale interval queries, outperforming SOTA on EPIC-KITCHENS-100 for most metrics, and showing strong performance on the Perception Test. Our ablations show the critical role of integrating the two modalities and modelling their time intervals in achieving this performance. Code and models at: https://***/JacobChalk/TIM.

关键词： action detection action recognition audio-visual learning egocentric videos video understanding

来源：评论

学校读者我要写书评

暂无评论

MaskCLR: Attention-Guided Contrastive Learning for Robust Action Representation Learning

MaskCLR: Attention-Guided Contrastive Learning for Robust Ac...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Abdelfattah, Mohamed Hassan, Mariam Alahi, Alexandre Ecole Polytech Fed Lausanne EPFL Lausanne Switzerland

ISBN: (纸本)9798350353006

Current transformer-based skeletal action recognition models tend to focus on a limited set of joints and low-level motion patterns to predict action classes. This results in significant performance degradation under small skeleton perturbations or changing the pose estimator between training and testing. In this work, we introduce MaskCLR, a new Masked Contrastive Learning approach for Robust skeletal action recognition. We propose an Attention-Guided Proba-bilistic Masking strategy to occlude the most important joints and encourage the model to explore a larger set of discrimi-native joints. Furthermore, we propose a Multi-Level Contrastive Learning paradigm to enforce the representations of standard and occluded skeletons to be class-discriminative, i.e., more compact within each class and more dispersed across different classes. Our approach helps the model capture the high-level action semantics instead of low-level joint variations, and can be conveniently incorporated into transformer-based models. Without loss of generality, we combine MaskCLR with three transformer backbones: the vanilla transformer, DSTFormer, and STTFormer. Extensive experiments on NTU60, NTU120, and Kinetics400 show that MaskCLR consistently outperforms previous state-of-the-art methods on standard and perturbed skeletons from different pose estimators, showing improved accuracy, generalization, and robustness. Project website: https://***.

关键词： contrastive learning Skeleton-based action recognition

来源：评论

学校读者我要写书评

暂无评论

Prompting vision Foundation Models for Pathology Image Analysis

Prompting Vision Foundation Models for Pathology Image Analy...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Yin, Chong Liu, Siqi Zhou, Kaiyang Wong, Vincent Wai-Sun Yuen, Pong C. Hong Kong Baptist Univ Dept Comp Sci Hong Kong Peoples R China Chinese Univ Hong Kong Shenzhen Res Inst Big Data Shenzhen Peoples R China Chinese Univ Hong Kong Dept Med & Therapeut Hong Kong Peoples R China

ISBN: (纸本)9798350353006

The rapid increase in cases of non-alcoholic fatty liver disease (NAFLD) in recent years has raised significant public concern. Accurately identifying tissue alteration regions is crucial for the diagnosis of NAFLD, but this task presents challenges in pathology image analysis, particularly with small-scale datasets. Recently, the paradigm shift from full fine-tuning to prompting in adapting vision foundation models has offered a new perspective for small-scale data analysis. However, existing prompting methods based on task-agnostic prompts are mainly developed for generic image recognition, which fall short in providing instructive cues for complex pathology images. In this paper, we propose Quantitative Attribute-based Prompting (QAP), a novel prompting method specifically for liver pathology image analysis. QAP is based on two quantitative attributes, namely K-function-based spatial attributes and histogram-based morphological attributes, which are aimed for quantitative assessment of tissue states. Moreover, a conditional prompt generator is designed to turn these instance-specific attributes into visual prompts. Extensive experiments on three diverse tasks demonstrate that our task-specific prompting method achieves better diagnostic performance as well as better interpretability. Code is available at https://***/7LFB/QAP.

关键词： pathology image analysis Prompt quantitative attributes

来源：评论

学校读者我要写书评

暂无评论

InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models

InteractDiffusion: Interaction Control in Text-to-Image Diff...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Hoe, Jiun Tian Jiang, Xudong Chan, Chee Seng Tan, Yap-Peng Hu, Weipeng Nanyang Technol Univ Sch EEE Singapore Singapore Univ Malaya CISiP Kuala Lumpur Malaysia

ISBN: (纸本)9798350353013;9798350353006

Large-scale text-to-image (T2I) diffusion models have showcased incredible capabilities in generating coherent images based on textual descriptions, enabling vast applications in content generation. While recent advancements have introduced control over factors such as object localization, posture, and image contours, a crucial gap remains in our ability to control the interactions between objects in the generated content. Well-controlling interactions in generated images could yield meaningful applications, such as creating realistic scenes with interacting characters. In this work, we study the problems of conditioning T2I diffusion models with Human-Object Interaction (HOI) information, consisting of a triplet label (person, action, object) and corresponding bounding boxes. We propose a pluggable interaction control model, called InteractDiffusion that extends existing pre-trained T2I diffusion models to enable them being better conditioned on interactions. Specifically, we tokenize the HOI information and learn their relationships via interaction embeddings. A conditioning self-attention layer is trained to map HOI tokens to visual tokens, thereby conditioning the visual tokens better in existing T2I diffusion models. Our model attains the ability to control the interaction and location on existing T2I diffusion models, which outperforms existing baselines by a large margin in HOI detection score, as well as fidelity in FID and KID. Project page: https://***/interactdiffusion.

关键词： computer vision conditional image generation diffusion model generative ai human-object interaction image generation

来源：评论

学校读者我要写书评

暂无评论

RoMa: Robust Dense Feature Matching

RoMa: Robust Dense Feature Matching

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Edstedt, Johan Sun, Qiyu Bokman, Georg Wadenback, Marten Felsberg, Michael Linkoping Univ Linkoping Sweden East China Univ Sci & Technol Shanghai Peoples R China Chalmers Univ Technol Gothenburg Sweden

ISBN: (纸本)9798350353006

Feature matching is an important computer vision task that involves estimating correspondences between two images of a 3D scene, and dense methods estimate all such correspondences. The aim is to learn a robust model, i.e., a model able to match under challenging real-world changes. In this work, we propose such a model, leveraging frozen pretrained features from the foundation model DINOv2. Although these features are significantly more robust than local features trained from scratch, they are inherently coarse. We therefore combine them with specialized ConvNet fine features, creating a precisely localizable feature pyramid. To further improve robustness, we propose a tailored transformer match decoder that predicts anchor probabilities, which enables it to express multimodality. Finally, we propose an improved loss formulation through regression-by-classification with subsequent robust regression. We conduct a comprehensive set of experiments that show that our method, RoMa, achieves significant gains, setting a new state-of-the-art. In particular, we achieve a 36% improvement on the extremely challenging WxBS benchmark. Code is provided at ***/Parskatt/RoMa.

关键词： 3D vision dense feature matching dense matching feature matching geometry estimation image matching two-view geometry

来源：评论

学校读者我要写书评

暂无评论

Domain Prompt Learning with Quaternion Networks

Domain Prompt Learning with Quaternion Networks

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Cao, Qinglong Xu, Zhengqin Chen, Yuntian Ma, Chao Yang, Xiaokang Shanghai Jiao Tong Univ AI Inst MoE Key Lab Artificial Intelligence Shanghai Peoples R China Eastern Inst Technol Ningbo Inst Digital Twin Ningbo Peoples R China

ISBN: (纸本)9798350353006

Prompt learning has emerged as a potent and resource-efficient technique in large vision-Language Models (VLMs). However, its application in adapting VLMs to specialized domains like remote sensing and medical imaging, termed domain prompt learning, remains relatively unexplored. Although large-scale domain-specific foundation models offer a potential solution, their focus on a singular vision level presents challenges in prompting both vision and language modalities. To address this limitation, we propose leveraging domain-specific knowledge from these foundation models to transfer the robust recognition abilities of VLMs from generalized to specialized domains, employing quaternion networks. Our method entails utilizing domain-specific vision features from domain-specific foundation models to guide the transformation of generalized contextual embeddings from the language branch into a specialized space within quaternion networks. Furthermore, we introduce a hierarchical approach that derives vision prompt features by analyzing intermodal relationships between hierarchical language prompt features and domain-specific vision features. Through this mechanism, quaternion networks can effectively explore intermodal relationships in specific domains, facilitating domain-specific vision-language contrastive learning. Extensive experiments conducted on domain-specific datasets demonstrate that our proposed method achieves new state-of-the-art results in prompt learning. Codes are available at https://***/caoql98/DPLQ.

关键词： domain prompt learning quaternion networks vision-language models

来源：评论

学校读者我要写书评

暂无评论

One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained vision-Language Models

One Prompt Word is Enough to Boost Adversarial Robustness fo...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Lin, L. Guan, Haoyan Qiu, Jianing Spratling, Michael Kings Coll London London England Imperial Coll London London England

ISBN: (纸本)9798350353006

Large pre-trained vision-Language Models (VLMs) like CLIP, despite having remarkable generalization ability, are highly vulnerable to adversarial examples. This work studies the adversarial robustness of VLMs from the novel perspective of the text prompt instead of the extensively studied model weights (frozen in this work). We first show that the effectiveness of both adversarial attack and defense are sensitive to the used text prompt. Inspired by this, we propose a method to improve resilience to adversarial attacks by learning a robust text prompt for VLMs. The proposed method, named Adversarial Prompt Tuning (APT), is effective while being both computationally and data efficient. Extensive experiments are conducted across 15 datasets and 4 data sparsity schemes (from 1-shot to full training data settings) to show APT's superiority over hand-engineered prompts and other state-of-the-art adaption methods. APT demonstrated excellent abilities in terms of the in-distribution performance and the generalization under input distribution shift and across datasets. Surprisingly, by simply adding one learned word to the prompts, APT can significantly boost the accuracy and robustness ((sic)=4/255 ) over the hand-engineered prompts by +13% and +8.5% on average respectively. The improvement further increases, in our most effective setting, to +26.4% for accuracy and +16.7% for robustness. Code is available at https://***/TreeLLi/APT.

关键词： adversarial examples adversarial robustness CLIP text prompting vision-language models VLMs

来源：评论

学校读者我要写书评

暂无评论

Robust Emotion recognition in Context Debiasing

Robust Emotion Recognition in Context Debiasing

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Yang, Dingkang Yang, Kun Li, Mingcheng Wang, Shunli Wang, Shuaibing Zhang, Lihua Fudan Univ Acad Engn & Technol Shanghai Peoples R China Cognit & Intelligent Technol Lab CIT Lab Beijing Peoples R China Jilin Prov Key Lab Intelligence Sci & Engn Changchun Peoples R China Minist Educ Engn Res Ctr AI & Robot Shanghai Peoples R China

ISBN: (纸本)9798350353006

Context-aware emotion recognition (CAER) has recently boosted the practical applications of affective computing techniques in unconstrained environments. Mainstream CAER methods invariably extract ensemble representations from diverse contexts and subject-centred characteristics to perceive the target person's emotional state. Despite advancements, the biggest challenge remains due to context bias interference. The harmful bias forces the models to rely on spurious correlations between background contexts and emotion labels in likelihood estimation, causing severe performance bottlenecks and confounding valuable context priors. In this paper, we propose a counterfactual emotion inference (CLEF) framework to address the above issue. Specifically, we first formulate a generalized causal graph to decouple the causal relationships among the variables in CAER. Following the causal graph, CLEF introduces a non-invasive context branch to capture the adverse direct effect caused by the context bias. During the inference, we eliminate the direct context effect from the total causal effect by comparing factual and counterfactual outcomes, resulting in bias mitigation and robust prediction. As a model-agnostic framework, CLEF can be readily integrated into existing methods, bringing consistent performance gains.

关键词： Counterfactual inference Emotion recognition

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 30 31 32 33 34 35 36 37 38 39 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：