检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

23,001 篇 会议
126 册 图书
92 篇 期刊文献

馆藏范围

23,218 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

13,623 篇 工学
- 11,108 篇 计算机科学与技术...
- 3,479 篇 软件工程
- 2,445 篇 机械工程
- 1,716 篇 光学工程
- 1,075 篇 电气工程
- 1,014 篇 控制科学与工程
- 785 篇 信息与通信工程
- 412 篇 仪器科学与技术
- 352 篇 生物工程
- 251 篇 生物医学工程（可授...
- 196 篇 电子科学与技术（可...
- 114 篇 化学工程与技术
- 108 篇 安全科学与工程
- 100 篇 测绘科学与技术
- 88 篇 建筑学
- 87 篇 交通运输工程
- 84 篇 土木工程
3,494 篇 医学
- 3,481 篇 临床医学
- 81 篇 基础医学(可授医学...
3,242 篇 理学
- 1,939 篇 物理学
- 1,640 篇 数学
- 563 篇 统计学（可授理学、...
- 500 篇 生物学
- 249 篇 系统科学
- 107 篇 化学
522 篇 管理学
- 311 篇 图书情报与档案管...
- 224 篇 管理科学与工程(可...
- 76 篇 工商管理
276 篇 艺术学
- 276 篇 设计学（可授艺术学...
66 篇 法学
- 63 篇 社会学
38 篇 农学
28 篇 教育学
22 篇 经济学
10 篇 军事学
3 篇 文学

主题

10,187 篇 computer vision
3,967 篇 pattern recognit...
3,005 篇 training
2,007 篇 computational mo...
1,818 篇 visualization
1,815 篇 cameras
1,516 篇 feature extracti...
1,481 篇 shape
1,455 篇 three-dimensiona...
1,438 篇 image segmentati...
1,287 篇 robustness
1,205 篇 computer archite...
1,155 篇 semantics
1,147 篇 conferences
1,107 篇 layout
1,092 篇 computer science
1,087 篇 object detection
1,025 篇 benchmark testin...
970 篇 codes
922 篇 face recognition

机构

136 篇 univ sci & techn...
121 篇 univ chinese aca...
118 篇 chinese univ hon...
107 篇 carnegie mellon ...
101 篇 tsinghua univers...
101 篇 microsoft resear...
95 篇 swiss fed inst t...
93 篇 zhejiang univ pe...
82 篇 university of sc...
81 篇 zhejiang univers...
80 篇 university of ch...
77 篇 shanghai ai lab ...
72 篇 shanghai jiao to...
69 篇 national laborat...
67 篇 microsoft res as...
67 篇 alibaba grp peop...
64 篇 adobe research
61 篇 tsinghua univ pe...
60 篇 peking univ peop...
59 篇 univ oxford oxfo...

作者

81 篇 van gool luc
72 篇 timofte radu
64 篇 zhang lei
47 篇 luc van gool
40 篇 yang yi
40 篇 li stan z.
37 篇 loy chen change
34 篇 chen chen
33 篇 xiaoou tang
32 篇 liu yang
32 篇 qi tian
31 篇 tian qi
31 篇 sun jian
30 篇 murino vittorio
30 篇 pascal fua
29 篇 darrell trevor
29 篇 li fei-fei
28 篇 li xin
28 篇 ying shan
27 篇 vasconcelos nuno

语言

23,137 篇 英文
53 篇 其他
22 篇 中文
5 篇 土耳其文
2 篇 日文

检索条件"任意字段=IEEE Conference on Computer Vision and Pattern Recognition Workshops"

共 23219 条记录，以下是241-250 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

On the Robustness of Language Guidance for Low-Level vision Tasks: Findings from Depth Estimation

On the Robustness of Language Guidance for Low-Level Vision ...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Chatterjee, Agneet Gokhale, Tejas Baral, Chitta Yang, Yezhou Arizona State Univ Tempe AZ 85281 USA Univ Maryland Baltimore Cty Baltimore MD 21228 USA

ISBN: (纸本)9798350353013;9798350353006

Recent advances in monocular depth estimation have been made by incorporating natural language as additional guidance. Although yielding impressive results, the impact of the language prior, particularly in terms of generalization and robustness, remains unexplored. In this paper, we address this gap by quantifying the impact of this prior and introduce methods to benchmark its effectiveness across various settings. We generate "low-level" sentences that convey object-centric, three-dimensional spatial relationships, incorporate them as additional language priors and evaluate their downstream impact on depth estimation. Our key finding is that current language-guided depth estimators perform optimally only with scene-level descriptions and counter-intuitively fare worse with low level descriptions. Despite leveraging additional data, these methods are not robust to directed adversarial attacks and decline in performance with an increase in distribution shift. Finally, to provide a foundation for future research, we identify points of failures and offer insights to better understand these shortcomings. With an increasing number of methods using language for depth estimation, our findings highlight the opportunities and pitfalls that require careful consideration for effective deployment in real-world settings. (1)

关键词： Low-level vision robustness vision and language

来源：评论

学校读者我要写书评

暂无评论

A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives

A Backpack Full of Skills: Egocentric Video Understanding wi...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Peirone, Simone Alberto Pistilli, Francesca Alliegro, Antonio Averta, Giuseppe Politecnico Torino Turin Italy Ist Italiano Tecnol Genoa Italy

ISBN: (纸本)9798350353006

Human comprehension of a video stream is naturally broad: in a few instants, we are able to understand what is happening, the relevance and relationship of objects, and forecast what will follow in the near future, everything all at once. We believe that - to effectively transfer such an holistic perception to intelligent machines - an important role is played by learning to correlate concepts and to abstract knowledge coming from different tasks, to synergistically exploit them when learning novel skills. To accomplish this, we look for a unified approach to video understanding which combines shared temporal modelling of human actions with minimal overhead, to support multiple down-stream tasks and enable cooperation when learning novel skills. We then propose EgoPack, a solution that creates a collection of task perspectives that can be carried across downstream tasks and used as a potential source of additional insights, as a backpack of skills that a robot can carry around and use when needed. We demonstrate the effectiveness and efficiency of our approach on four Ego4D benchmarks, outperforming current state-of-the-art methods. Project webpage: ***/EgoPack.

关键词： Egocentric vision Video Understanding

来源：评论

学校读者我要写书评

暂无评论

H-ViT: A Hierarchical vision Transformer for Deformable Image Registration

H-ViT: A Hierarchical Vision Transformer for Deformable Imag...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Ghahremani, Morteza Khateri, Mohammad Jian, Bailiang Wiestler, Benedikt Adeli, Ehsan Wachinger, Christian Tech Univ Munich Munich Germany Munich Ctr Machine Learning Munich Germany Univ Eastern Finland Espoo Finland Stanford Univ Stanford CA 94305 USA

ISBN: (纸本)9798350353006

This paper introduces a novel top-down representation approach for deformable image registration, which estimates the deformation field by capturing various short-and long-range flow features at different scale levels. As a Hierarchical vision Transformer (H-ViT), we propose a dual self-attention and cross-attention mechanism that uses high-level features in the deformation field to represent low-level ones, enabling information streams in the deformation field across all voxel patch embeddings irrespective of their spatial proximity. Since high-level features contain abstract flow patterns, such patterns are expected to effectively contribute to the representation of the deformation field in lower scales. When the self-attention module utilizes within-scale short-range patterns for representation, the cross-attention modules dynamically look for the key tokens across different scales to further interact with the local query voxel patches. Our method shows superior accuracy and visual quality over the state-of-the-art registration methods in five publicly available datasets, highlighting a substantial enhancement in the performance of medical imaging registration. The project link is available at https://***/hvit.

关键词： Deformable Image Registration Hierarchical cross-attention Medical Imaging vision Transformer

来源：评论

学校读者我要写书评

暂无评论

Split to Merge: Unifying Separated Modalities for Unsupervised Domain Adaptation

Split to Merge: Unifying Separated Modalities for Unsupervis...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Li, Xinyao Li, Yuke Du, Zhekai Li, Fengling Lu, Ke Li, Jingjing Univ Elect Sci & Technol China Chengdu Peoples R China Boston Coll Chestnut Hill MA 02167 USA Univ Technol Sydney Sydney NSW Australia

ISBN: (纸本)9798350353006

Large vision-language models (VLMs) like CLIP have demonstrated good zero-shot learning performance in the unsupervised domain adaptation task. Yet, most transfer approaches for VLMs focus on either the language or visual branches, overlooking the nuanced interplay between both modalities. In this work, we introduce a Unified Modality Separation (UniMoS) framework for unsupervised domain adaptation. Leveraging insights from modality gap studies, we craft a nimble modality separation network that distinctly disentangles CLIP's features into language-associated and vision-associated components. Our proposed Modality-Ensemble Training (MET) method fosters the exchange of modality-agnostic information while maintaining modality-specific nuances. We align features across domains using a modality discriminator. Comprehensive evaluations on three benchmarks reveal our approach sets a new state-of-the-art with minimal computational costs. Code: https://***/TL-UESTC/UniMoS.

关键词： deep learning Unsupervised domain adaptation vision-language model

来源：评论

学校读者我要写书评

暂无评论

Sequential Modeling Enables Scalable Learning for Large vision Models

Sequential Modeling Enables Scalable Learning for Large Visi...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Bail, Yutong Geng, Xinyang Mangalam, Karttikeya Bar, Amir Yuille, Alan L. Darrell, Trevor Malik, Jitendra Efros, Alexei A. UC Berkeley BAIR Berkeley CA 94720 USA Johns Hopkins Univ Baltimore MD 21218 USA

ISBN: (纸本)9798350353006

We introduce a novel sequential modeling approach which enables learning a Large vision Model (LVM) without making use of any linguistic data. To do this, we define a common format, "visual sentences", in which we can represent raw images and videos as well as annotated data sources such as semantic segmentations and depth reconstructions with-out needing any meta-knowledge beyond the pixels. Once this wide variety of visual data (comprising 420 billion to-kens) is represented as sequences, the model can be trained to minimize a cross-entropy loss for next token prediction. By training across various scales of model architecture and data diversity, we provide empirical evidence that our models scale effectively. Many different vision tasks can be solved by designing suitable visual prompts at test time.

关键词： pretraining scaling Self-supervised Learning

来源：评论

学校读者我要写书评

暂无评论

Proceedings - 2024 ieee Winter conference on Applications of computer vision workshops, WACVW 2024

Proceedings - 2024 IEEE Winter Conference on Applications of...

引用

2024 ieee/CVF Winter conference on Applications of computer vision workshops, WACVW 2024

ISBN: (纸本)9798350370287

The proceedings contain 123 papers. The topics discussed include: the SARFish dataset and challenge;NORPPA: NOvel ringed seal re-identification by pelage pattern aggregation;multiple toddler tracking in indoor videos;challenges in video-based infant action recognition: a critical examination of the state of the art;KABR: in-situ dataset for kenyan animal behavior recognition from drone videos;the hitchhiker's guide to endangered species pose estimation;efficient domain adaptation via generative prior for 3D infant pose estimation;dynamic gaussian splatting from markerless motion capture reconstruct infants movements;neural texture puppeteer: a framework for neural geometry and texture rendering of articulated shapes, enabling re-identification at interactive speed;and DigiDogs: single-view 3D pose estimation of dogs using synthetic training data.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Action Scene Graphs for Long-Form Understanding of Egocentric Videos

Action Scene Graphs for Long-Form Understanding of Egocentri...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Rodin, Ivan Furnari, Antonino Min, Kyle Tripathi, Subarna Farinella, Giovanni Maria Univ Catania Catania Italy Intel Labs Hillsboro OR USA

ISBN: (纸本)9798350353006

We present Egocentric Action Scene Graphs (EASGs), a new representation for long-form understanding of egocentric videos. EASGs extend standard manually-annotated representations of egocentric videos, such as verb-noun action labels, by providing a temporally evolving graph-based description of the actions performed by the camera wearer, including interacted objects, their relationships, and how actions unfold in time. Through a novel annotation procedure, we extend the Ego4D dataset adding manually labeled Egocentric Action Scene Graphs which offer a rich set of annotations for long-from egocentric video understanding. We hence define the EASG generation task and provide a baseline approach, establishing preliminary benchmarks. Experiments on two downstream tasks, action anticipation and activity summarization, highlight the effectiveness of EASGs for long-form egocentric video understanding. We will release the dataset and code to replicate experiments and annotations 1 1 The code is available at https://***/fpv-iplab/EASG.

关键词： egocentric vision long-form video understanding scene graphs

来源：评论

学校读者我要写书评

暂无评论

Context-based and Diversity-driven Specificity in Compositional Zero-Shot Learning

Context-based and Diversity-driven Specificity in Compositio...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Li, Yun Liu, Zhe Chen, Hang Yao, Lina CSIROs Data61 Clayton Vic Australia Bytedance Ltd Beijing Peoples R China Snap Inc Santa Monica CA USA

ISBN: (纸本)9798350353006

Compositional Zero-Shot Learning (CZSL) aims to recognize unseen attribute-object pairs based on a limited set of observed examples. Current CZSL methodologies, despite their advancements, tend to neglect the distinct specificity levels present in attributes. For instance, given images of sliced strawberries, they may fail to prioritize 'Sliced-Strawberry' over a generic 'Red-Strawberry', despite the former being more informative. They also suffer from ballooning search space when shifting from Close-World (CW) to Open-World (OW) CZSL. To address the issues, we introduce the Context-based and Diversity-driven Specificity learning framework for CZSL (CDS-CZSL). Our framework evaluates the specificity of attributes by considering the diversity of objects they apply to and their related context. This novel approach allows for more accurate predictions by emphasizing specific attribute-object pairs and improves composition filtering in OW-CZSL. We conduct experiments in both CW and OW scenarios, and our model achieves state-of-the-art results across three datasets.

关键词： compositional zero-shot learning transfer learning vision language model

来源：评论

学校读者我要写书评

暂无评论

Transductive Zero-Shot and Few-Shot CLIP

Transductive Zero-Shot and Few-Shot CLIP

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Martin, Segolene Huang, Yunshi Shakeri, Fereshteh Pesquet, Jean-Christophe Ben Ayed, Ismail Univ Paris Saclay CVN Cent Supelec INRIA Paris France ETS Montreal Montreal PQ Canada

ISBN: (纸本)9798350353006

Transductive inference has been widely investigated in few-shot image classification, but completely overlooked in the recent, fast growing literature on adapting vision-langage models like CLIP. This paper addresses the transductive zero-shot and few-shot CLIP classification challenge, in which inference is performed jointly across a mini-batch of unlabeled query samples, rather than treating each instance independently. We initially construct informative vision-text probability features, leading to a classification problem on the unit simplex set. Inspired by Expectation-Maximization (EM), our optimization-based classification objective models the data probability distribution for each class using a Dirichlet law. The minimization problem is then tackled with a novel block Majorization-Minimization algorithm, which simultaneously estimates the distribution parameters and class assignments. Extensive numerical experiments on 11 datasets underscore the benefits and efficacy of our batch inference approach. On zero-shot tasks with test batches of 75 samples, our approach yields near 20% improvement in ImageNet accuracy over CLIP's zero-shot performance. Additionally, we outperform state-of-the-art methods in the few-shot setting. The code is available at: https://***/SegoleneMartin/transductive-CLIP.

关键词： expectation-maximization few-shot text-vision models transductive learning zero-shot

来源：评论

学校读者我要写书评

暂无评论

Probing the 3D Awareness of Visual Foundation Models

Probing the 3D Awareness of Visual Foundation Models

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： El Banani, Mohamed Raj, Amit Maninis, Kevis-Kokitsi Kar, Abhishek Li, Yuanzhen Rubinstein, Michael Sun, Deqing Guibas, Leonidas Johnson, Justin Jampani, Varun Univ Michigan Ann Arbor MI 48109 USA Google Mountain View CA 94043 USA Stability AI London ON Canada

ISBN: (纸本)9798350353006

Recent advances in large-scale pretraining have yielded visual foundation models with strong capabilities. Not only can recent models generalize to arbitrary images for their training task, their intermediate representations are useful for other visual tasks such as detection and segmentation. Given that such models can classify, delineate, and localize objects in 2D, we ask whether they also represent their 3D structure? In this work, we analyze the 3D awareness of visual foundation models. We posit that 3D awareness implies that representations (1) encode the 3D structure of the scene and (2) consistently represent the surface across views. We conduct a series of experiments using task-specific probes and zero-shot inference procedures on frozen features. Our experiments reveal several limitations of the current models. Our code and analysis can be found at https://***/mbanani/probe3d.

关键词： 3D Awareness 3D vision Foundation Models Representation Learning

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 21 22 23 24 25 26 27 28 29 30 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：