检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

20,994 篇 会议
99 册 图书
85 篇 期刊文献
1 篇 学位论文

馆藏范围

21,178 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

13,603 篇 工学
- 11,179 篇 计算机科学与技术...
- 2,631 篇 机械工程
- 2,542 篇 软件工程
- 990 篇 光学工程
- 849 篇 电气工程
- 676 篇 控制科学与工程
- 487 篇 信息与通信工程
- 242 篇 仪器科学与技术
- 215 篇 测绘科学与技术
- 159 篇 生物医学工程（可授...
- 150 篇 生物工程
- 139 篇 电子科学与技术（可...
- 69 篇 安全科学与工程
- 67 篇 化学工程与技术
- 55 篇 建筑学
- 53 篇 土木工程
- 43 篇 力学（可授工学、理...
- 41 篇 航空宇航科学与技...
3,462 篇 医学
- 3,452 篇 临床医学
- 41 篇 基础医学(可授医学...
2,483 篇 理学
- 1,247 篇 数学
- 1,213 篇 物理学
- 446 篇 统计学（可授理学、...
- 418 篇 生物学
- 269 篇 系统科学
- 67 篇 化学
424 篇 管理学
- 218 篇 管理科学与工程(可...
- 217 篇 图书情报与档案管...
- 43 篇 工商管理
144 篇 艺术学
- 142 篇 设计学（可授艺术学...
41 篇 法学
31 篇 农学
12 篇 经济学
10 篇 教育学
6 篇 文学
3 篇 军事学

主题

8,072 篇 computer vision
2,879 篇 pattern recognit...
2,859 篇 training
1,808 篇 computational mo...
1,718 篇 visualization
1,478 篇 cameras
1,381 篇 shape
1,374 篇 face recognition
1,364 篇 three-dimensiona...
1,342 篇 feature extracti...
1,269 篇 image segmentati...
1,156 篇 robustness
1,109 篇 semantics
982 篇 layout
978 篇 object detection
953 篇 computer archite...
952 篇 benchmark testin...
931 篇 codes
918 篇 object recogniti...
899 篇 computer science

机构

174 篇 univ sci & techn...
154 篇 carnegie mellon ...
149 篇 univ chinese aca...
144 篇 chinese univ hon...
110 篇 microsoft resear...
104 篇 zhejiang univ pe...
98 篇 swiss fed inst t...
93 篇 tsinghua univ pe...
92 篇 tsinghua univers...
90 篇 microsoft res as...
88 篇 shanghai ai lab ...
83 篇 zhejiang univers...
76 篇 alibaba grp peop...
74 篇 hong kong univ s...
73 篇 university of sc...
72 篇 peking univ peop...
68 篇 shanghai jiao to...
68 篇 university of ch...
66 篇 google res mount...
66 篇 univ oxford oxfo...

作者

83 篇 van gool luc
71 篇 zhang lei
60 篇 timofte radu
49 篇 yang yi
49 篇 luc van gool
48 篇 xiaoou tang
43 篇 darrell trevor
43 篇 tian qi
42 篇 loy chen change
42 篇 sun jian
41 篇 qi tian
37 篇 vasconcelos nuno
37 篇 liu yang
37 篇 chen xilin
37 篇 li fei-fei
36 篇 liu xiaoming
36 篇 shan shiguang
36 篇 li stan z.
36 篇 torralba antonio
33 篇 zhou jie

语言

21,137 篇 英文
31 篇 中文
5 篇 土耳其文
4 篇 其他
2 篇 日文

检索条件"任意字段=2011 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2011"

共 21179 条记录，以下是131-140 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

On the Faithfulness of vision Transformer Explanations

On the Faithfulness of Vision Transformer Explanations

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Wu, Junyi Kang, Weitai Tang, Hao Hong, Yuan Yan, Yan IIT Dept Comp Sci Chicago IL 60616 USA Carnegie Mellon Univ Robot Inst Pittsburgh PA 15213 USA Univ Connecticut Dept Comp Sci Storrs CT USA

ISBN: (纸本)9798350353006

To interpret vision Transformers, post-hoc explanations assign salience scores to input pixels, providing human-understandable heatmaps. However, whether these interpretations reflect true rationales behind the model's output is still underexplored. To address this gap, we study the faithfulness criterion of explanations: the assigned salience scores should represent the influence of the corresponding input pixels on the model's predictions. To evaluate faithfulness, we introduce Salience-guided Faithfulness Coefficient (SaCo), a novel evaluation metric leveraging essential information of salience distribution. Specifically, we con-duct pair-wise comparisons among distinct pixel groups and then aggregate the differences in their salience scores, resulting in a coefficient that indicates the explanation's degree of faithfulness. Our explorations reveal that current metrics struggle to differentiate between advanced explanation methods and Random Attribution, thereby failing to capture the faithfulness property. In contrast, our pro-posed SaCo offers a reliable faithfulness measurement, establishing a robust metric for interpretations. Furthermore, our SaCo demonstrates that the use of gradient and multi-layer aggregation can markedly enhance the faithfulness of attention-based explanation, shedding light on potential paths for advancing vision Transformer explainability.

关键词： Explainability Transformer

来源：评论

学校读者我要写书评

暂无评论

Transductive Zero-Shot and Few-Shot CLIP

Transductive Zero-Shot and Few-Shot CLIP

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Martin, Segolene Huang, Yunshi Shakeri, Fereshteh Pesquet, Jean-Christophe Ben Ayed, Ismail Univ Paris Saclay CVN Cent Supelec INRIA Paris France ETS Montreal Montreal PQ Canada

ISBN: (纸本)9798350353006

Transductive inference has been widely investigated in few-shot image classification, but completely overlooked in the recent, fast growing literature on adapting vision-langage models like CLIP. This paper addresses the transductive zero-shot and few-shot CLIP classification challenge, in which inference is performed jointly across a mini-batch of unlabeled query samples, rather than treating each instance independently. We initially construct informative vision-text probability features, leading to a classification problem on the unit simplex set. Inspired by Expectation-Maximization (EM), our optimization-based classification objective models the data probability distribution for each class using a Dirichlet law. The minimization problem is then tackled with a novel block Majorization-Minimization algorithm, which simultaneously estimates the distribution parameters and class assignments. Extensive numerical experiments on 11 datasets underscore the benefits and efficacy of our batch inference approach. On zero-shot tasks with test batches of 75 samples, our approach yields near 20% improvement in ImageNet accuracy over CLIP's zero-shot performance. Additionally, we outperform state-of-the-art methods in the few-shot setting. The code is available at: https://***/SegoleneMartin/transductive-CLIP.

关键词： expectation-maximization few-shot text-vision models transductive learning zero-shot

来源：评论

学校读者我要写书评

暂无评论

Spectral and Polarization vision: Spectro-polarimetric Real-world Dataset

Spectral and Polarization Vision: Spectro-polarimetric Real-...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Jeon, Yujin Cho, Eunsue Kim, Youngchan Moon, Yunseong Omer, Khalid Heide, Felix Baek, Seung-Hwan POSTECH Pohang South Korea Meta Menlo Pk CA USA Princeton Univ Princeton NJ 08544 USA

ISBN: (纸本)9798350353006

Image datasets are essential not only in validating existing methods in computer vision but also in developing new methods. Many image datasets exist, consisting of trichromatic intensity images taken with RGB cameras, which are designed to replicate human vision. However, polarization and spectrum, the wave properties of light that animals in harsh environments and with limited brain capacity often rely on, remain underrepresented in existing datasets. Although there are previous spectro-polarimetric datasets, they have insufficient object diversity, limited illumination conditions, linear-only polarization data, and inadequate image count. Here, we introduce two spectro-polarimetric datasets, consisting of trichromatic Stokes images and hyperspectral Stokes images. These datasets encompass both linear and circular polarization;they introduce multiple spectral channels;and they feature a broad selection of real-world scenes. With our dataset in hand, we analyze the spectro-polarimetric image statistics, develop efficient representations of such high-dimensional data, and evaluate spectral dependency of shape-from-polarization methods. As such, the proposed dataset promises a foundation for data-driven spectro-polarimetric imaging and vision research.

关键词： Computational Imaging

来源：评论

学校读者我要写书评

暂无评论

Split to Merge: Unifying Separated Modalities for Unsupervised Domain Adaptation

Split to Merge: Unifying Separated Modalities for Unsupervis...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Li, Xinyao Li, Yuke Du, Zhekai Li, Fengling Lu, Ke Li, Jingjing Univ Elect Sci & Technol China Chengdu Peoples R China Boston Coll Chestnut Hill MA 02167 USA Univ Technol Sydney Sydney NSW Australia

ISBN: (纸本)9798350353006

Large vision-language models (VLMs) like CLIP have demonstrated good zero-shot learning performance in the unsupervised domain adaptation task. Yet, most transfer approaches for VLMs focus on either the language or visual branches, overlooking the nuanced interplay between both modalities. In this work, we introduce a Unified Modality Separation (UniMoS) framework for unsupervised domain adaptation. Leveraging insights from modality gap studies, we craft a nimble modality separation network that distinctly disentangles CLIP's features into language-associated and vision-associated components. Our proposed Modality-Ensemble Training (MET) method fosters the exchange of modality-agnostic information while maintaining modality-specific nuances. We align features across domains using a modality discriminator. Comprehensive evaluations on three benchmarks reveal our approach sets a new state-of-the-art with minimal computational costs. Code: https://***/TL-UESTC/UniMoS.

关键词： deep learning Unsupervised domain adaptation vision-language model

来源：评论

学校读者我要写书评

暂无评论

Language Model Guided Interpretable Video Action Reasoning

Language Model Guided Interpretable Video Action Reasoning

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Wang, Ning Zhu, Guangming Li, H. S. Zhang, Liang Shah, Syed Afaq Ali Bennamoun, Mohammed Xidian Univ Xian Peoples R China Edith Cowan Univ Joondalup Australia Univ Western Australia Perth Australia

ISBN: (纸本)9798350353006

While neural networks have excelled in video action recognition tasks, their "black-box" nature often obscures the understanding of their decision-making processes. Recent approaches used inherently interpretable models to analyze video actions in a manner akin to human reasoning. These models, however, usually fall short in performance compared to their "black-box" counterparts. In this work, we present a new framework named Language-guided Interpretable Action recognition framework (LaIAR). LaIAR leverages knowledge from language models to enhance both the recognition capabilities and the interpretability of video models. In essence, we redefine the problem of understanding video model decisions as a task of aligning video and language models. Using the logical reasoning captured by the language model, we steer the training of the video model. This integrated approach not only improves the video model's adaptability to different domains but also boosts its overall performance. Extensive experiments on two complex video action datasets, Charades & CAD-120, validates the improved performance and interpretability of our LaIAR framework. The code of LaIAR is available at https://***/NingWang2049/LaIAR.

关键词： action recognition Explainable AI video understanding

来源：评论

学校读者我要写书评

暂无评论

Convolutional Prompting meets Language Models for Continual Learning

Convolutional Prompting meets Language Models for Continual ...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Roy, Anurag Moulick, Riddhiman Verma, Vinay K. Ghosh, Saptarshi Das, Abir IIT Kharagpur Kharagpur W Bengal India IML Amazon India Hyderabad India

ISBN: (纸本)9798350353006

Continual Learning (CL) enables machine learning models to learn from continuously shifting new training data in absence of data from old tasks. Recently, pretrained vision transformers combined with prompt tuning have shown promise for overcoming catastrophic forgetting in CL. These approaches rely on a pool of learnable prompts which can be inefficient in sharing knowledge across tasks leading to inferior performance. In addition, the lack of fine-grained layer specific prompts does not allow these to fully express the strength of the prompts for CL. We address these limitations by proposing ConvPrompt, a novel convolutional prompt creation mechanism that maintains layer-wise shared embeddings, enabling both layer-specific learning and better concept transfer across tasks. The intelligent use of convolution enables us to maintain a low parameter overhead without compromising performance. We further leverage Large Language Models to generate fine-grained text descriptions of each category which are used to get task similarity and dynamically decide the number of prompts to be learned. Extensive experiments demonstrate the superiority of ConvPrompt and improves SOTA by 3% with significantly less parameter overhead. We also perform strong ablation over various modules to disentangle the importance of different components.(1)

关键词： Continual Learning Language Models Prompt Tuning vision Transformer

来源：评论

学校读者我要写书评

暂无评论

GRAM: Global Reasoning for Multi-Page VQA

GRAM: Global Reasoning for Multi-Page VQA

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Blau, Tsachi Fogel, Sharon Ronen, Roi Goltst, Alona Per, Shahar Tsi Ben Avraham, Elad Aberdam, Aviad Ganz, Roy Litman, Ron Technion Haifa Israel AWS AI Labs Shanghai Peoples R China

ISBN: (纸本)9798350353006

The increasing use of transformer-based large language models brings forward the challenge of processing long sequences. In document visual question answering (DocVQA), leading methods focus on the single-page setting, while documents can span hundreds of pages. We present GRAM, a method that seamlessly extends pre-trained single-page models to the multi-page setting, without requiring computationally-heavy pretraining. To do so, we leverage a single-page encoder for local page-level understanding, and enhance it with document-level designated layers and learnable tokens, facilitating the flow of information across pages for global reasoning. To enforce our model to utilize the newly introduced document tokens, we propose a tailored bias adaptation method. For additional computational savings during decoding, we introduce an optional compression stage using our compression-transformer(C-Former), reducing the encoded sequence length, thereby allowing a tradeoff between quality and latency. Extensive experiments showcase GRAM's state-of-the-art performance on the benchmarks for multi-page DocVQA, demonstrating the effectiveness of our approach.

关键词： Document Understanding Long Sequence Processing vision Language Models

来源：评论

学校读者我要写书评

暂无评论

Building vision-Language Models on Solid Foundations with Masked Distillation

Building Vision-Language Models on Solid Foundations with Ma...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Sameni, Sepehr Kafle, Kushal Tan, Hao Jenni, Simon Univ Bern Bern Switzerland Adobe Res San Jose CA USA

ISBN: (纸本)9798350353006

Recent advancements in vision-Language Models (VLMs) have marked a significant leap in bridging the gap between computer vision and natural language processing. However, traditional VLMs, trained through contrastive learning on limited and noisy image-text pairs, often lack the spatial and linguistic understanding to generalize well to dense vision tasks or less common languages. Our approach, Solid Foundation CLIP (SF-CLIP), circumvents this issue by implicitly building on the solid visual and language understanding of foundational models trained on vast amounts of unimodal data. SF-CLIP integrates contrastive image-text pretraining with a masked knowledge distillation from large foundational text and vision models. This methodology guides our VLM in developing robust text and image representations. As a result, SF-CLIP shows exceptional zero-shot classification accuracy and enhanced image and text retrieval capabilities, setting a new state of the art for ViT-B/16 trained on YFCC15M and CC12M. Moreover, the dense per-patch supervision enhances our zero-shot and linear probe performance in semantic segmentation tasks. A remarkable aspect of our model is its multilingual proficiency, evidenced by strong retrieval results in multiple languages despite being trained predominantly on English data. We achieve all of these improvements without sacrificing the training efficiency through our selective application of masked distillation and the inheritance of teacher word embeddings.

关键词： CLIP Distillation LLM Multilingual Multimodal Representation Learning

来源：评论

学校读者我要写书评

暂无评论

HybridNeRF: Efficient Neural Rendering via Adaptive Volumetric Surfaces

HybridNeRF: Efficient Neural Rendering via Adaptive Volumetr...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Turki, Haithem Agrawal, Vasu Bulo, Samuel Rota Porzi, Lorenzo Kontschieder, Peter Ramanan, Deva Zollhofer, Michael Richardt, Christian Meta Real Labs Menlo Pk CA 94025 USA Carnegie Mellon Univ Pittsburgh PA 15213 USA

ISBN: (纸本)9798350353006

Neural radiance fields provide state-of-the-art view synthesis quality but tend to be slow to render. One reason is that they make use of volume rendering, thus requiring many samples (and model queries) per ray at render time. Although this representation is flexible and easy to optimize, most real-world objects can be modeled more efficiently with surfaces instead of volumes, requiring far fewer samples per ray. This observation has spurred considerable progress in surface representations, such as signed distance functions, but these may struggle to model semi-opaque and thin structures. We propose a method, HybridNeRF, that leverages the strengths of both representations by rendering most objects as surfaces while modeling the (typically) small fraction of challenging regions volumetrically. We evaluate HybridNeRF against the challenging Eyeful Tower dataset [38] along with other commonly used view synthesis datasets. When comparing to state-of-the-art baselines, including recent rasterization-based approaches, we improve error rates by 15-30% while achieving real-time framerates (at least 36 FPS) for virtual-reality resolutions (2K -> 2K). Project page: https://***/hybrid-nerf/.

关键词： 3d reconstruction computer vision machine learning neural radiance fields neural rendering novel view synthesis

来源：评论

学校读者我要写书评

暂无评论

Improved Visual Grounding through Self-Consistent Explanations

Improved Visual Grounding through Self-Consistent Explanatio...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： He, Ruozhen Cascante-Bonilla, Paola Yang, Ziyan Berg, Alexander C. Ordonez, Vicente Rice Univ Houston TX 77005 USA Univ Calif Irvine Irvine CA USA

ISBN: (纸本)9798350353006

vision-and-language models trained to match images with text can be combined with visual explanation methods to point to the locations of specific objects in an image. Our work shows that the localization -"grounding"-abilities of these models can be further improved by fine-tuning for self-consistent visual explanations. We propose a strategy for augmenting existing text-image datasets with paraphrases using a large language model, and SelfEQ, a weakly-supervised strategy on visual explanation maps for paraphrases that encourages self-consistency. Specifically, for an input textual phrase, we attempt to generate a paraphrase and finetune the model so that the phrase and paraphrase map to the same region in the image. We posit that this both expands the vocabulary that the model is able to handle, and improves the quality of the object locations highlighted by gradient-based visual explanation methods (e.g. GradCAM). We demonstrate that SelfEQ improves performance on Flickr30k, ReferIt, and RefCOCO+ over a strong baseline method and several prior works. Particularly, comparing to other methods that do not use any type of box annotations, we obtain 84.07% on Flickr30k ( an absolute improvement of 4.69%), 67.40% on ReferIt (an absolute improvement of 7.68%), and 75.10%, 55.49% on RefCOCO+ test sets A and B respectively (an absolute improvement of 3.74% on average).

关键词： vision and language visual grounding

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 10 11 12 13 14 15 16 17 18 19 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：