检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

50,479 篇 会议
1,421 册 图书
1,041 篇 期刊文献
1 篇 学位论文

馆藏范围

52,940 篇 电子文献
4 种 纸本馆藏

日期分布

学科分类号

31,811 篇 工学
- 24,804 篇 计算机科学与技术...
- 12,568 篇 软件工程
- 5,153 篇 光学工程
- 4,756 篇 电气工程
- 4,436 篇 信息与通信工程
- 4,257 篇 机械工程
- 3,956 篇 控制科学与工程
- 2,474 篇 生物工程
- 1,728 篇 生物医学工程（可授...
- 1,584 篇 仪器科学与技术
- 1,317 篇 电子科学与技术（可...
- 793 篇 化学工程与技术
- 698 篇 安全科学与工程
- 542 篇 交通运输工程
- 379 篇 建筑学
- 331 篇 土木工程
11,839 篇 理学
- 6,434 篇 物理学
- 5,405 篇 数学
- 2,761 篇 生物学
- 1,910 篇 统计学（可授理学、...
- 801 篇 化学
- 669 篇 系统科学
5,305 篇 医学
- 5,094 篇 临床医学
- 729 篇 基础医学(可授医学...
- 459 篇 药学(可授医学、理...
3,350 篇 管理学
- 1,953 篇 图书情报与档案管...
- 1,535 篇 管理科学与工程(可...
- 479 篇 工商管理
720 篇 艺术学
- 718 篇 设计学（可授艺术学...
428 篇 法学
- 401 篇 社会学
297 篇 农学
197 篇 教育学
163 篇 经济学
63 篇 文学
49 篇 军事学

主题

17,385 篇 computer vision
9,017 篇 pattern recognit...
4,196 篇 training
3,815 篇 feature extracti...
3,134 篇 cameras
2,870 篇 computational mo...
2,789 篇 image segmentati...
2,622 篇 visualization
2,573 篇 shape
2,533 篇 face recognition
2,171 篇 robustness
2,123 篇 computer science
1,973 篇 object detection
1,959 篇 computer archite...
1,878 篇 layout
1,853 篇 object recogniti...
1,802 篇 three-dimensiona...
1,725 篇 neural networks
1,708 篇 humans
1,691 篇 image recognitio...

机构

165 篇 univ chinese aca...
144 篇 tsinghua univers...
136 篇 national laborat...
108 篇 univ sci & techn...
104 篇 zhejiang univers...
100 篇 shanghai jiao to...
95 篇 microsoft resear...
94 篇 university of sc...
86 篇 zhejiang univ pe...
84 篇 shanghai ai lab ...
74 篇 school of comput...
69 篇 computer vision ...
68 篇 peking univ peop...
68 篇 chinese acad sci...
65 篇 chinese univ hon...
63 篇 institute of inf...
62 篇 google res mount...
61 篇 univ oxford oxfo...
59 篇 univ toronto on
57 篇 swiss fed inst t...

作者

91 篇 van gool luc
87 篇 umapada pal
76 篇 zhang lei
64 篇 lee seong-whan
49 篇 vittorio murino
42 篇 yang yi
34 篇 nassir navab
33 篇 li xin
33 篇 jie yang
32 篇 liu yang
31 篇 escalera sergio
31 篇 loy chen change
30 篇 ling haibin
30 篇 h. bischof
29 篇 zhou jie
29 篇 vasconcelos nuno
29 篇 jan-michael frah...
29 篇 hanqing lu
28 篇 blumenstein mich...
27 篇 jia yunde

语言

51,871 篇 英文
835 篇 其他
241 篇 中文
22 篇 土耳其文
5 篇 西班牙文
2 篇 日文
2 篇 葡萄牙文
2 篇 俄文

检索条件"任意字段=IEEE Conference on Computer Vision and Pattern Recognition"

共 52943 条记录，以下是91-100 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Adapters Strike Back

Adapters Strike Back

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Steitz, Jan-Martin Roth, Stefan Tech Univ Darmstadt Dept Comp Sci Darmstadt Germany Hessian AI Darmstadt Germany

ISBN: (纸本)9798350353006

Adapters provide an efficient and lightweight mechanism for adapting trained transformer models to a variety of different tasks. However, they have often been found to be outperformed by other adaptation mechanisms including low-rank adaptation. In this paper, we provide an in-depth study of adapters, their internal structure, as well as various implementation choices. We uncover pitfalls for using adapters and suggest a concrete, improved adapter architecture, called Adapter+, that not only outperforms previous adapter implementations but surpasses a number of other, more complex adaptation mechanisms in several challenging settings. Despite this, our suggested adapter is highly robust and, unlike previous work, requires little to no manual intervention when addressing a novel scenario. Adapter+ reaches state-of-the-art average accuracy on the VTAB benchmark, even without a per-task hyperparameter optimization.(1)

关键词： adapters deep learning image classification parameter-efficient transfer learning transfer learning vision transformer

来源：评论

学校读者我要写书评

暂无评论

PointInfinity: Resolution-Invariant Point Diffusion Models

PointInfinity: Resolution-Invariant Point Diffusion Models

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Huang, Zixuan Johnson, Justin Debnath, Shoubhik Rehg, James M. Wu, Chao-Yuan Meta FAIR Menlo Pk CA 94025 USA Univ Illinois Champaign IL 61820 USA

ISBN: (纸本)9798350353006

We present PointInfinity, an efficient family of point cloud diffusion models. Our core idea is to use a transformer-based architecture with a fixed-size, resolution-invariant latent representation. This enables efficient training with low-resolution point clouds, while allowing high-resolution point clouds to be generated during inference. More importantly, we show that scaling the test-time resolution beyond the training resolution improves the fidelity of generated point clouds and surfaces. We analyze this phenomenon and draw a link to classifier-free guidance commonly used in diffusion models, demonstrating that both allow trading off fidelity and variability during inference. Experiments on CO3D show that PointInfinity can efficiently generate high-resolution point clouds (up to 131k points, 31 more than Point-E) with state-of-the-art quality.

关键词： 3D Diffusion Model 3D Generation 3D reconstruction 3D vision computer vision Deep Learning

来源：评论

学校读者我要写书评

暂无评论

AirPlanes: Accurate Plane Estimation via 3D-Consistent Embeddings

AirPlanes: Accurate Plane Estimation via 3D-Consistent Embed...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Watson, Jamie Aleotti, Filippo Sayed, Mohamed Qureshi, Zawar Mac Aodha, Oisin Brostow, Gabriel Firman, Michael Vicente, Sara Niantic San Francisco CA 94111 USA Univ Edinburgh Edinburgh Midlothian Scotland UCL London England

ISBN: (纸本)9798350353013;9798350353006

Extracting planes from a 3D scene is useful for downstream tasks in robotics and augmented reality. In this paper we tackle the problem of estimating the planar surfaces in a scene from posed images. Our first finding is that a surprisingly competitive baseline results from combining popular clustering algorithms with recent improvements in 3D geometry estimation. However, such purely geometric methods are understandably oblivious to plane semantics, which are crucial to discerning distinct planes. To overcome this limitation, we propose a method that predicts multi-view consistent plane embeddings that complement geometry when clustering points into planes. We show through extensive evaluation on the ScanNetV2 dataset that our new method outperforms existing approaches and our strong geometric baseline for the task of plane estimation.

关键词： 3D vision plane estimation

来源：评论

学校读者我要写书评

暂无评论

You Only Need Less Attention at Each Stage in vision Transformers

You Only Need Less Attention at Each Stage in Vision Transfo...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Zhang, Shuoxi Liu, Hanpeng Lin, Stephen He, Kun Huazhong Univ Sci & Technol Wuhan Peoples R China Microsoft Res Asia Beijing Peoples R China

ISBN: (纸本)9798350353013;9798350353006

The advent of vision Transformers (ViTs) marks a substantial paradigm shift in the realm of computer vision. ViTs capture the global information of images through self-attention modules, which perform dot product computations among patchified image tokens. While self- attention modules empower ViTs to capture long-range dependencies, the computational complexity grows quadratically with the number of tokens, which is a major hindrance to the practical application of ViTs. Moreover, the self-attention mechanism in deep ViTs is also susceptible to the attention saturation issue. Accordingly, we argue against the necessity of computing the attention scores in every layer, and we propose the Less-Attention vision Transformer (LaViT), which computes only a few attention operations at each stage and calculates the subsequent feature alignments in other layers via attention transformations that leverage the previously calculated attention scores. This novel approach can mitigate two primary issues plaguing traditional self-attention modules: the heavy computational burden and attention saturation. Our proposed architecture offers superior efficiency and ease of implementation, merely requiring matrix multiplications that are highly optimized in contemporary deep learning frameworks. Moreover, our architecture demonstrates exceptional performance across various vision tasks including classification, detection and segmentation.

关键词： computer vision efficient training vision transformer

来源：评论

学校读者我要写书评

暂无评论

3DFIRES: Few Image 3D REconstruction for Scenes with Hidden Surfaces

3DFIRES: Few Image 3D REconstruction for Scenes with Hidden ...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Jin, Linyi Kulkarni, Nilesh Fouhey, David F. Univ Michigan Ann Arbor MI 48109 USA NYU New York NY USA

ISBN: (纸本)9798350353006

This paper introduces 3DFIRES, a novel system for scene-level 3D reconstruction from posed images. Designed to work with as few as one view, 3DFIRES reconstructs the complete geometry of unseen scenes, including hidden surfaces. With multiple view inputs, our method produces full reconstruction within all camera frustums. A key feature of our approach is the fusion of multi-view information at the feature level, enabling the production of coherent and comprehensive 3D reconstruction. We train our system on non-watertight scans from large-scale real scene dataset. We show it matches the efficacy of single-view reconstruction methods with only one input and surpasses existing techniques in both quantitative and qualitative measures for sparse-view 3D reconstruction. Project page: https://***/3DFIRES/

关键词： 3D vision hidden surfaces scene reconstruction

来源：评论

学校读者我要写书评

暂无评论

Training vision Transformers for Semi-Supervised Semantic Segmentation

Training Vision Transformers for Semi-Supervised Semantic Se...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Hu, Xinting Jiang, Li Schiele, Bernt Max Planck Inst Informat Saarland Informat Campus Munich Germany

ISBN: (纸本)9798350353013;9798350353006

We present S(4)Former, a novel approach to training vision Transformers for Semi-Supervised Semantic Segmentation (S-4). At its core, S(4)Former employs a vision Transformer within a classic teacher-student framework, and then leverages three novel technical ingredients: PatchShujjle as a parameter-free perturbation technique, Patch-Adaptive Self-Attention (PASA) as a fine-grained feature modulation method, and the innovative Negative Class Ranking (NCR) regularization loss. Based on these regularization modules aligned with Transformer-specific characteristics across the image input, feature, and output dimensions, S(4)Former exploits the Transformer's ability to capture and difef rentiate consistent global contextual information in unlabeled images. Overall, S(4)Former not only defines a new state of the art in s(4) but also maintains a streamlined and scalable architecture. Being readily compatible with existingframeworks, S(4)Former achieves strong improvements (up to 4.9%) on benchmarks like Pascal VOC 2012, COCO, and Cityscapes, with varying numbers of labeled data. The code is at https://***/JoyHuYY1412/S4Former.

关键词：

来源：评论

学校读者我要写书评

暂无评论

V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs

V*: Guided Visual Search as a Core Mechanism in Multimodal L...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Wu, Penghao Xie, Saining Univ Calif San Diego La Jolla CA 92093 USA NYU New York NY USA

ISBN: (纸本)9798350353006

When we look around and perform complex tasks, how we see and selectively process what we see is crucial. However, the lack of this visual search mechanism in current multimodal LLMs (MLLMs) hinders their ability to focus on important visual details, especially when handling high-resolution and visually crowded images. To address this, we introduce V*, an LLM- guided visual search mechanism that employs the world knowledge in LLMs for efficient visual querying. When combined with an MLLM, this mechanism enhances collaborative reasoning, contextual understanding, and precise visual grounding. This integration results in a new MLLM meta-architecture, named Show, sEArch, and TelL (SEAL). We further create V*Bench, a benchmark specifically designed to evaluate MLLMs in their ability to process high-resolution images and focus on visual details. Our study highlights the necessity of incorporating visual search capabilities into multimodal systems. The code is available here.

关键词： multimodal large language model vision and language visual search

来源：评论

学校读者我要写书评

暂无评论

Token Transformation Matters: Towards Faithful Post-hoc Explanation for vision Transformer

Token Transformation Matters: Towards Faithful Post-hoc Expl...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Wu, Junyi Duan, Bin Kang, Weitai Tang, Hao Yan, Yan IIT Dept Comp Sci Chicago IL 60616 USA Carnegie Mellon Univ Robot Inst Pittsburgh PA 15213 USA

ISBN: (纸本)9798350353006

While Transformers have rapidly gained popularity in various computer vision applications, post-hoc explanations of their internal mechanisms remain largely unexplored. vision Transformers extract visual information by representing image regions as transformed tokens and integrating them via attention weights. However, existing post-hoc explanation methods merely consider these attention weights, neglecting crucial information from the transformed tokens, which fails to accurately illustrate the rationales behind the models' predictions. To incorporate the influence of token transformation into interpretation, we propose TokenTM, a novel post-hoc explanation method that utilizes our introduced measurement of token transformation effects. Specifically, we quantify token transformation effects by measuring changes in token lengths and correlations in their directions pre- and post-transformation. Moreover, we develop initialization and aggregation rules to integrate both attention weights and token transformation effects across all layers, capturing holistic token contributions throughout the model. Experimental results on segmentation and perturbation tests demonstrate the superiority of our proposed TokenTM compared to state-of-the-art vision Transformer explanation methods.

关键词： Explainability Transformer

来源：评论

学校读者我要写书评

暂无评论

Enhancing vision-Language Pre-training with Rich Supervisions

Enhancing Vision-Language Pre-training with Rich Supervision...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Gao, Yuan Shi, Kunyu Zhu, Pengkai Belval, Edouard Nuriel, Oren Appalaraju, Srikar Ghadar, Shabnam Tu, Zhuowen Mahadevan, Vijay Soatto, Stefano Stanford Univ Stanford CA 94305 USA AWS AI Labs Seattle WA USA Amazon Seattle WA 98109 USA

ISBN: (纸本)9798350353006

We propose Strongly Supervised pre-training with ScreenShots (S4) - a novel pre-training paradigm for vision-Language Models using data from large-scale web screenshot rendering. Using web screenshots unlocks a treasure trove of visual and textual cues that are not present in using image-text pairs. In S4, we leverage the inherent tree-structured hierarchy of HTML elements and the spatial localization to carefully design 10 pre-training tasks with large scale annotated data. These tasks resemble down-stream tasks across different domains and the annotations are cheap to obtain. We demonstrate that, compared to current screenshot pre-training objectives, our innovative pre-training method significantly enhances performance of image-to-text model in nine varied and popular downstream tasks - up to 76.1% improvements on Table Detection, and at least 1% on Widget Captioning.

关键词： pre-training UI understanding vision language models

来源：评论

学校读者我要写书评

暂无评论

Contrasting intra-modal and ranking cross-modal hard negatives to enhance visio-linguistic compositional understanding

Contrasting intra-modal and ranking cross-modal hard negativ...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Zhang, Le Awal, Rabiul Agrawal, Aishwarya Mila Quebec AI Inst Montreal PQ Canada Univ Montreal Montreal PQ Canada Canada CIFAR AI Chair Montreal PQ Canada

ISBN: (纸本)9798350353006

vision-Language Models (VLMs), such as CLIP, exhibit strong image-text comprehension abilities, facilitating advances in several downstream tasks such as zero-shot image classification, image-text retrieval, and text-to-image generation. However, the compositional reasoning abilities of existing VLMs remains subpar. The root of this limitation lies in the inadequate alignment between the images and captions in the pretraining datasets. Additionally, the current contrastive learning objective fails to focus on fine-grained grounding components like relations, actions, and attributes, resulting in "bag-of-words" representations. We introduce a simple and effective method to improve compositional reasoning in VLMs. Our method better leverages available datasets by refining and expanding the standard image-text contrastive learning framework. Our approach does not require specific annotations and does not incur extra parameters. When integrated with CLIP, our technique yields notable improvement over state-of-the-art baselines across five vision-language compositional benchmarks.(1)

关键词： compositional understanding contrastive learning vision-language models

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 6 7 8 9 10 11 12 13 14 15 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：