检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

50,479 篇 会议
1,421 册 图书
1,041 篇 期刊文献
1 篇 学位论文

馆藏范围

52,940 篇 电子文献
4 种 纸本馆藏

日期分布

学科分类号

31,811 篇 工学
- 24,804 篇 计算机科学与技术...
- 12,568 篇 软件工程
- 5,153 篇 光学工程
- 4,756 篇 电气工程
- 4,436 篇 信息与通信工程
- 4,257 篇 机械工程
- 3,956 篇 控制科学与工程
- 2,474 篇 生物工程
- 1,728 篇 生物医学工程（可授...
- 1,584 篇 仪器科学与技术
- 1,317 篇 电子科学与技术（可...
- 793 篇 化学工程与技术
- 698 篇 安全科学与工程
- 542 篇 交通运输工程
- 379 篇 建筑学
- 331 篇 土木工程
11,839 篇 理学
- 6,434 篇 物理学
- 5,405 篇 数学
- 2,761 篇 生物学
- 1,910 篇 统计学（可授理学、...
- 801 篇 化学
- 669 篇 系统科学
5,305 篇 医学
- 5,094 篇 临床医学
- 729 篇 基础医学(可授医学...
- 459 篇 药学(可授医学、理...
3,350 篇 管理学
- 1,953 篇 图书情报与档案管...
- 1,535 篇 管理科学与工程(可...
- 479 篇 工商管理
720 篇 艺术学
- 718 篇 设计学（可授艺术学...
428 篇 法学
- 401 篇 社会学
297 篇 农学
197 篇 教育学
163 篇 经济学
63 篇 文学
49 篇 军事学

主题

17,385 篇 computer vision
9,017 篇 pattern recognit...
4,196 篇 training
3,815 篇 feature extracti...
3,134 篇 cameras
2,870 篇 computational mo...
2,789 篇 image segmentati...
2,622 篇 visualization
2,573 篇 shape
2,533 篇 face recognition
2,171 篇 robustness
2,123 篇 computer science
1,973 篇 object detection
1,959 篇 computer archite...
1,878 篇 layout
1,853 篇 object recogniti...
1,802 篇 three-dimensiona...
1,725 篇 neural networks
1,708 篇 humans
1,691 篇 image recognitio...

机构

165 篇 univ chinese aca...
144 篇 tsinghua univers...
136 篇 national laborat...
108 篇 univ sci & techn...
104 篇 zhejiang univers...
100 篇 shanghai jiao to...
95 篇 microsoft resear...
94 篇 university of sc...
86 篇 zhejiang univ pe...
84 篇 shanghai ai lab ...
74 篇 school of comput...
69 篇 computer vision ...
68 篇 peking univ peop...
68 篇 chinese acad sci...
65 篇 chinese univ hon...
63 篇 institute of inf...
62 篇 google res mount...
61 篇 univ oxford oxfo...
59 篇 univ toronto on
57 篇 swiss fed inst t...

作者

91 篇 van gool luc
87 篇 umapada pal
76 篇 zhang lei
64 篇 lee seong-whan
49 篇 vittorio murino
42 篇 yang yi
34 篇 nassir navab
33 篇 li xin
33 篇 jie yang
32 篇 liu yang
31 篇 escalera sergio
31 篇 loy chen change
30 篇 ling haibin
30 篇 h. bischof
29 篇 zhou jie
29 篇 vasconcelos nuno
29 篇 jan-michael frah...
29 篇 hanqing lu
28 篇 blumenstein mich...
27 篇 jia yunde

语言

51,871 篇 英文
835 篇 其他
241 篇 中文
22 篇 土耳其文
5 篇 西班牙文
2 篇 日文
2 篇 葡萄牙文
2 篇 俄文

检索条件"任意字段=IEEE Conference on Computer Vision and Pattern Recognition"

共 52943 条记录，以下是201-210 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Efficient Test-Time Adaptation of vision-Language Models

Efficient Test-Time Adaptation of Vision-Language Models

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Karmanov, Adilbek Guan, Dayan Lu, Shijian El Saddik, Abdulmotaleb Xing, Eric Mohamed bin Zayed Univ Artificial Intelligence Abu Dhabi U Arab Emirates Nanyang Technol Univ Singapore Singapore Univ Ottawa Ottawa ON Canada Carnegie Mellon Univ Pittsburgh PA 15213 USA

ISBN: (纸本)9798350353006

Test-time adaptation with pre-trained vision-language models has attracted increasing attention for tackling distribution shifts during the test time. Though prior studies have achieved very promising performance, they involve intensive computation which is severely unaligned with test-time adaptation. We design TDA, a training-free dynamic adapter that enables effective and efficient test-time adaptation with vision-language models. TDA works with a lightweight key-value cache that maintains a dynamic queue with few-shot pseudo labels as values and the corresponding test-sample features as keys. Leveraging the key-value cache, TDA allows adapting to test data gradually via progressive pseudo label refinement which is super-efficient without incurring any backpropagation. In addition, we introduce negative pseudo labeling that alleviates the adverse impact of pseudo label noises by assigning pseudo labels to certain negative classes when the model is uncertain about its pseudo label predictions. Extensive experiments over two benchmarks demonstrate TDA's superior effectiveness and efficiency as compared with the state-of- the-art. The code has been released in https://***/tda/.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Towards Better vision-Inspired vision-Language Models

Towards Better Vision-Inspired Vision-Language Models

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Cao, Yun-Hao Ji, Kaixiang Huang, Ziyuan Zheng, Chuanyang Liu, Jiajia Wang, Jian Chen, Jingdong Yang, Ming Nanjing Univ Natl Key Lab Novel Software Technol Nanjing Jiangsu Peoples R China Ant Grp Hangzhou Zhejiang Peoples R China

ISBN: (纸本)9798350353006

vision-language (VL) models have achieved unprecedented success recently, in which the connection module is the key to bridge the modality gap. Nevertheless, the abundant visual clues are not sufficiently exploited in most existing methods. On the vision side, most existing approaches only use the last feature of the vision tower, without using the low-level features. On the language side, most existing methods only introduce shallow vision-language interactions. In this paper, we present a vision-inspired vision-language connection module, dubbed as VIVL, which efficiently exploits the vision cue for VL models. To take advantage of the lower-level information from the vision tower, a feature pyramid extractor (FPE) is introduced to combine features from different intermediate layers, which enriches the visual cue with negligible parameters and computation overhead. To enhance VL interactions, we propose deep vision-conditioned prompts (DVCP) that allows deep interactions of vision and language features efficiently. Our VIVL exceeds the previous state-of- the-art method by 18.1 CIDEr when training from scratch on the COCO caption task, which greatly improves the data efficiency. When used as a plug-in module, VIVL consistently improves the performance for various backbones and VL frameworks, delivering new state-of-the-art results on multiple benchmarks, e.g., NoCaps and VQAv2.

关键词： deep learning vision-language models feature pyramid deep prompt

来源：评论

学校读者我要写书评

暂无评论

Transcending the Limit of Local Window: Advanced Super-Resolution Transformer with Adaptive Token Dictionary

Transcending the Limit of Local Window: Advanced Super-Resol...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Zhang, Leheng Li, Yawei Zhou, Xingyu Zhao, Xiaorui Gu, Shuhang Univ Elect Sci & Technol China Chengdu Peoples R China Swiss Fed Inst Technol Comp Vis Lab Zurich Switzerland Swiss Fed Inst Technol Integrated Syst Lab Zurich Switzerland

ISBN: (纸本)9798350353013;9798350353006

Single Image Super-Resolution is a classic computer vision problem that involves estimating high-resolution (HR) images from low-resolution (LR) ones. Although deep neural networks (DNNs), especially Transformers for super-resolution, have seen significant advancements in recent years, challenges still remain, particularly in limited receptive field caused by window-based self-attention. To address these issues, we introduce a group of auxiliary Adaptive Token Dictionary to SR Transformer and establish an ATD-SR method. The introduced token dictionary could learn prior information from training data and adapt the learned prior to specific testing image through an adaptive refinement step. The refinement strategy could not only provide global information to all input tokens but also group image tokens into categories. Based on category partitions, we further propose a category-based self-attention mechanism designed to leverage distant but similar tokens for enhancing input features. The experimental results show that our method achieves the best performance on various single image super-resolution benchmarks.

关键词： dictionary learning image super-resolution vision transformer

来源：评论

学校读者我要写书评

暂无评论

Dynamic Cues-Assisted Transformer for Robust Point Cloud Registration

Dynamic Cues-Assisted Transformer for Robust Point Cloud Reg...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Chen, Hong Yan, Pei Xiang, Sihe Tan, Yihua Huazhong Univ Sci & Technol Sch Artificial Intelligence & Automat Hubei Engn Res Ctr Machine Vision & Intelligent S Wuhan Peoples R China

ISBN: (纸本)9798350353006

Point Cloud Registration is a critical and challenging task in computer vision. Recent advancements have predominantly embraced a coarse-to-fine matching mechanism, with the key to matching the superpoints located in patches with interframe consistent structures. However, previous methods still face challenges with ambiguous matching, because the interference information aggregated from irrelevant regions may disturb the capture of interframe consistency relations, leading to wrong matches. To address this issue, we propose Dynamic Cues-Assisted Transformer (DCATr). Firstly, the interference from irrelevant regions is greatly reduced by constraining attention to certain cues, i.e., regions with highly correlated structures of potential corresponding superpoints. Secondly, cues-assisted attention is designed to mine the inter-frame consistency relations, while more attention is assigned to pairs with high consistent confidence in feature aggregation. Finally, a dynamic updating fashion is proposed to facilitate mining richer consistency information, further improving aggregated features' distinctiveness and relieving matching ambiguity. Extensive evaluations on indoor and outdoor standard benchmarks demonstrate that DCATr outperforms all state-of-the-art methods.

关键词： ambiguous feature matching Point cloud registration transformer

来源：评论

学校读者我要写书评

暂无评论

On the Faithfulness of vision Transformer Explanations

On the Faithfulness of Vision Transformer Explanations

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Wu, Junyi Kang, Weitai Tang, Hao Hong, Yuan Yan, Yan IIT Dept Comp Sci Chicago IL 60616 USA Carnegie Mellon Univ Robot Inst Pittsburgh PA 15213 USA Univ Connecticut Dept Comp Sci Storrs CT USA

ISBN: (纸本)9798350353006

To interpret vision Transformers, post-hoc explanations assign salience scores to input pixels, providing human-understandable heatmaps. However, whether these interpretations reflect true rationales behind the model's output is still underexplored. To address this gap, we study the faithfulness criterion of explanations: the assigned salience scores should represent the influence of the corresponding input pixels on the model's predictions. To evaluate faithfulness, we introduce Salience-guided Faithfulness Coefficient (SaCo), a novel evaluation metric leveraging essential information of salience distribution. Specifically, we con-duct pair-wise comparisons among distinct pixel groups and then aggregate the differences in their salience scores, resulting in a coefficient that indicates the explanation's degree of faithfulness. Our explorations reveal that current metrics struggle to differentiate between advanced explanation methods and Random Attribution, thereby failing to capture the faithfulness property. In contrast, our pro-posed SaCo offers a reliable faithfulness measurement, establishing a robust metric for interpretations. Furthermore, our SaCo demonstrates that the use of gradient and multi-layer aggregation can markedly enhance the faithfulness of attention-based explanation, shedding light on potential paths for advancing vision Transformer explainability.

关键词： Explainability Transformer

来源：评论

学校读者我要写书评

暂无评论

Fair-VPT: Fair Visual Prompt Tuning for Image Classification

Fair-VPT: Fair Visual Prompt Tuning for Image Classification

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Park, Sungho Byun, Hy Eran Yonsei Univ Seoul South Korea

ISBN: (纸本)9798350353006

Despite the remarkable success of vision Transformers (ViT) across diverse fields in computer vision, they have a clear drawback of expensive adaption cost for downstream tasks due to the increased scale. To address this, Visual Prompt Tuning (VPT) incorporates learnable parameters in the input space of ViT. While freezing the ViT backbone and tuning only the prompts, it exhibits superior performances to full fine-tuning. However, despite the outstanding advantage, we point out that VPT may lead to serious unfairness in downstream classification. Initially, we investigate the causes of unfairness in VPT, identifying the biasedly pre-trained ViT as a principal factor. Motivated by this observation, we propose a Fair Visual Prompt Tuning (Fair-VPT) which removes biased information in the pre-trained ViT while adapting it to downstream classification tasks. To this end, we categorize prompts into "cleaner prompts" and "target prompts". Based on this, we encode the class token in two different ways by either masking or not masking the target prompts in the self-attention process. These encoded tokens are trained with distinct objective functions, resulting in the inclusion of different information in the target and cleaner prompts. Moreover, we introduce a disentanglement loss based on contrastive learning to further decorrelate them. In experiments across diverse benchmarks, the proposed method demonstrates the most superior performance in terms of balanced classification accuracy and fairness.

关键词： FAI Fair-VPT Fairness Large vision Model vision Transformer Visual Prompt Tuning ViT VPT

来源：评论

学校读者我要写书评

暂无评论

RegionGPT: Towards Region Understanding vision Language Model

RegionGPT: Towards Region Understanding Vision Language Mode...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Guo, Qiushan De Mello, Shalini Yin, Hongxu Byeon, Wonmin Cheung, Ka Chun Yu, Yizhou Luo, Ping Liu, Sifei Univ Hong Kong Hong Kong Peoples R China NVIDIA San Francisco CA USA

ISBN: (纸本)9798350353006

vision language models (VLMs) have experienced rapid advancements through the integration of large language models (LLMs) with image-text pairs, yet they struggle with detailed regional visual understanding due to limited spatial awareness of the vision encoder, and the use of coarse-grained training data that lacks detailed, region-specific captions. To address this, we introduce RegionGPT (short as RGPT), a novel framework designed for complex region-level captioning and understanding. RGPT enhances the spatial awareness of regional representation with simple yet effective modifications to existing visual encoders in VLMs. We further improve performance on tasks requiring a specific output scope by integrating task-guided instruction prompts during both training and inference phases, while maintaining the model's versatility for general-purpose tasks. Additionally, we develop an automated region caption data generation pipeline, enriching the training set with detailed region-level captions. We demonstrate that a universal RGPT model can be effectively applied and significantly enhancing performance across a range of region-level tasks, including but not limited to complex region descriptions, reasoning, object classification, and referring expressions comprehension. Code will be released at the project page.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Spectral and Polarization vision: Spectro-polarimetric Real-world Dataset

Spectral and Polarization Vision: Spectro-polarimetric Real-...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Jeon, Yujin Cho, Eunsue Kim, Youngchan Moon, Yunseong Omer, Khalid Heide, Felix Baek, Seung-Hwan POSTECH Pohang South Korea Meta Menlo Pk CA USA Princeton Univ Princeton NJ 08544 USA

ISBN: (纸本)9798350353006

Image datasets are essential not only in validating existing methods in computer vision but also in developing new methods. Many image datasets exist, consisting of trichromatic intensity images taken with RGB cameras, which are designed to replicate human vision. However, polarization and spectrum, the wave properties of light that animals in harsh environments and with limited brain capacity often rely on, remain underrepresented in existing datasets. Although there are previous spectro-polarimetric datasets, they have insufficient object diversity, limited illumination conditions, linear-only polarization data, and inadequate image count. Here, we introduce two spectro-polarimetric datasets, consisting of trichromatic Stokes images and hyperspectral Stokes images. These datasets encompass both linear and circular polarization;they introduce multiple spectral channels;and they feature a broad selection of real-world scenes. With our dataset in hand, we analyze the spectro-polarimetric image statistics, develop efficient representations of such high-dimensional data, and evaluate spectral dependency of shape-from-polarization methods. As such, the proposed dataset promises a foundation for data-driven spectro-polarimetric imaging and vision research.

关键词： Computational Imaging

来源：评论

学校读者我要写书评

暂无评论

On the Robustness of Language Guidance for Low-Level vision Tasks: Findings from Depth Estimation

On the Robustness of Language Guidance for Low-Level Vision ...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Chatterjee, Agneet Gokhale, Tejas Baral, Chitta Yang, Yezhou Arizona State Univ Tempe AZ 85281 USA Univ Maryland Baltimore Cty Baltimore MD 21228 USA

ISBN: (纸本)9798350353013;9798350353006

Recent advances in monocular depth estimation have been made by incorporating natural language as additional guidance. Although yielding impressive results, the impact of the language prior, particularly in terms of generalization and robustness, remains unexplored. In this paper, we address this gap by quantifying the impact of this prior and introduce methods to benchmark its effectiveness across various settings. We generate "low-level" sentences that convey object-centric, three-dimensional spatial relationships, incorporate them as additional language priors and evaluate their downstream impact on depth estimation. Our key finding is that current language-guided depth estimators perform optimally only with scene-level descriptions and counter-intuitively fare worse with low level descriptions. Despite leveraging additional data, these methods are not robust to directed adversarial attacks and decline in performance with an increase in distribution shift. Finally, to provide a foundation for future research, we identify points of failures and offer insights to better understand these shortcomings. With an increasing number of methods using language for depth estimation, our findings highlight the opportunities and pitfalls that require careful consideration for effective deployment in real-world settings. (1)

关键词： Low-level vision robustness vision and language

来源：评论

学校读者我要写书评

暂无评论

A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives

A Backpack Full of Skills: Egocentric Video Understanding wi...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Peirone, Simone Alberto Pistilli, Francesca Alliegro, Antonio Averta, Giuseppe Politecnico Torino Turin Italy Ist Italiano Tecnol Genoa Italy

ISBN: (纸本)9798350353006

Human comprehension of a video stream is naturally broad: in a few instants, we are able to understand what is happening, the relevance and relationship of objects, and forecast what will follow in the near future, everything all at once. We believe that - to effectively transfer such an holistic perception to intelligent machines - an important role is played by learning to correlate concepts and to abstract knowledge coming from different tasks, to synergistically exploit them when learning novel skills. To accomplish this, we look for a unified approach to video understanding which combines shared temporal modelling of human actions with minimal overhead, to support multiple down-stream tasks and enable cooperation when learning novel skills. We then propose EgoPack, a solution that creates a collection of task perspectives that can be carried across downstream tasks and used as a potential source of additional insights, as a backpack of skills that a robot can carry around and use when needed. We demonstrate the effectiveness and efficiency of our approach on four Ego4D benchmarks, outperforming current state-of-the-art methods. Project webpage: ***/EgoPack.

关键词： Egocentric vision Video Understanding

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 17 18 19 20 21 22 23 24 25 26 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：