检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

20,994 篇 会议
99 册 图书
85 篇 期刊文献
1 篇 学位论文

馆藏范围

21,178 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

13,603 篇 工学
- 11,179 篇 计算机科学与技术...
- 2,631 篇 机械工程
- 2,542 篇 软件工程
- 990 篇 光学工程
- 849 篇 电气工程
- 676 篇 控制科学与工程
- 487 篇 信息与通信工程
- 242 篇 仪器科学与技术
- 215 篇 测绘科学与技术
- 159 篇 生物医学工程（可授...
- 150 篇 生物工程
- 139 篇 电子科学与技术（可...
- 69 篇 安全科学与工程
- 67 篇 化学工程与技术
- 55 篇 建筑学
- 53 篇 土木工程
- 43 篇 力学（可授工学、理...
- 41 篇 航空宇航科学与技...
3,462 篇 医学
- 3,452 篇 临床医学
- 41 篇 基础医学(可授医学...
2,483 篇 理学
- 1,247 篇 数学
- 1,213 篇 物理学
- 446 篇 统计学（可授理学、...
- 418 篇 生物学
- 269 篇 系统科学
- 67 篇 化学
424 篇 管理学
- 218 篇 管理科学与工程(可...
- 217 篇 图书情报与档案管...
- 43 篇 工商管理
144 篇 艺术学
- 142 篇 设计学（可授艺术学...
41 篇 法学
31 篇 农学
12 篇 经济学
10 篇 教育学
6 篇 文学
3 篇 军事学

主题

8,072 篇 computer vision
2,879 篇 pattern recognit...
2,859 篇 training
1,808 篇 computational mo...
1,718 篇 visualization
1,478 篇 cameras
1,381 篇 shape
1,374 篇 face recognition
1,364 篇 three-dimensiona...
1,342 篇 feature extracti...
1,269 篇 image segmentati...
1,156 篇 robustness
1,109 篇 semantics
982 篇 layout
978 篇 object detection
953 篇 computer archite...
952 篇 benchmark testin...
931 篇 codes
918 篇 object recogniti...
899 篇 computer science

机构

174 篇 univ sci & techn...
154 篇 carnegie mellon ...
149 篇 univ chinese aca...
144 篇 chinese univ hon...
110 篇 microsoft resear...
104 篇 zhejiang univ pe...
98 篇 swiss fed inst t...
93 篇 tsinghua univ pe...
92 篇 tsinghua univers...
90 篇 microsoft res as...
88 篇 shanghai ai lab ...
83 篇 zhejiang univers...
76 篇 alibaba grp peop...
74 篇 hong kong univ s...
73 篇 university of sc...
72 篇 peking univ peop...
68 篇 shanghai jiao to...
68 篇 university of ch...
66 篇 google res mount...
66 篇 univ oxford oxfo...

作者

83 篇 van gool luc
71 篇 zhang lei
60 篇 timofte radu
49 篇 yang yi
49 篇 luc van gool
48 篇 xiaoou tang
43 篇 darrell trevor
43 篇 tian qi
42 篇 loy chen change
42 篇 sun jian
41 篇 qi tian
37 篇 vasconcelos nuno
37 篇 liu yang
37 篇 chen xilin
37 篇 li fei-fei
36 篇 liu xiaoming
36 篇 shan shiguang
36 篇 li stan z.
36 篇 torralba antonio
33 篇 zhou jie

语言

21,137 篇 英文
31 篇 中文
5 篇 土耳其文
4 篇 其他
2 篇 日文

检索条件"任意字段=2011 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2011"

共 21179 条记录，以下是121-130 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

On the Robustness of Language Guidance for Low-Level vision Tasks: Findings from Depth Estimation

On the Robustness of Language Guidance for Low-Level Vision ...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Chatterjee, Agneet Gokhale, Tejas Baral, Chitta Yang, Yezhou Arizona State Univ Tempe AZ 85281 USA Univ Maryland Baltimore Cty Baltimore MD 21228 USA

ISBN: (纸本)9798350353013;9798350353006

Recent advances in monocular depth estimation have been made by incorporating natural language as additional guidance. Although yielding impressive results, the impact of the language prior, particularly in terms of generalization and robustness, remains unexplored. In this paper, we address this gap by quantifying the impact of this prior and introduce methods to benchmark its effectiveness across various settings. We generate "low-level" sentences that convey object-centric, three-dimensional spatial relationships, incorporate them as additional language priors and evaluate their downstream impact on depth estimation. Our key finding is that current language-guided depth estimators perform optimally only with scene-level descriptions and counter-intuitively fare worse with low level descriptions. Despite leveraging additional data, these methods are not robust to directed adversarial attacks and decline in performance with an increase in distribution shift. Finally, to provide a foundation for future research, we identify points of failures and offer insights to better understand these shortcomings. With an increasing number of methods using language for depth estimation, our findings highlight the opportunities and pitfalls that require careful consideration for effective deployment in real-world settings. (1)

关键词： Low-level vision robustness vision and language

来源：评论

学校读者我要写书评

暂无评论

Efficient Test-Time Adaptation of vision-Language Models

Efficient Test-Time Adaptation of Vision-Language Models

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Karmanov, Adilbek Guan, Dayan Lu, Shijian El Saddik, Abdulmotaleb Xing, Eric Mohamed bin Zayed Univ Artificial Intelligence Abu Dhabi U Arab Emirates Nanyang Technol Univ Singapore Singapore Univ Ottawa Ottawa ON Canada Carnegie Mellon Univ Pittsburgh PA 15213 USA

ISBN: (纸本)9798350353006

Test-time adaptation with pre-trained vision-language models has attracted increasing attention for tackling distribution shifts during the test time. Though prior studies have achieved very promising performance, they involve intensive computation which is severely unaligned with test-time adaptation. We design TDA, a training-free dynamic adapter that enables effective and efficient test-time adaptation with vision-language models. TDA works with a lightweight key-value cache that maintains a dynamic queue with few-shot pseudo labels as values and the corresponding test-sample features as keys. Leveraging the key-value cache, TDA allows adapting to test data gradually via progressive pseudo label refinement which is super-efficient without incurring any backpropagation. In addition, we introduce negative pseudo labeling that alleviates the adverse impact of pseudo label noises by assigning pseudo labels to certain negative classes when the model is uncertain about its pseudo label predictions. Extensive experiments over two benchmarks demonstrate TDA's superior effectiveness and efficiency as compared with the state-of- the-art. The code has been released in https://***/tda/.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Projecting Trackable Thermal patterns for Dynamic computer vision

Projecting Trackable Thermal Patterns for Dynamic Computer V...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Sheinin, Mark Sankaranarayanan, Aswin C. Narasimhan, Srinivasa G. Carnegie Mellon Univ Pittsburgh PA 15213 USA

ISBN: (纸本)9798350353006

Adding artificial patterns to objects, like QR codes, can ease tasks such as object tracking, robot navigation, and conveying information (e.g., a label or a website link). However, these patterns require a physical application and they alter the object's appearance. Conversely, projected patterns can temporarily change the object's appearance, aiding tasks like 3D scanning and retrieving object textures and shading. However, projected patterns impede dynamic tasks like object tracking because they do not 'stick' to the object's surface. Or do they? This paper introduces a novel approach combining the advantages of projected and persistent physical patterns. Our system projects heat patterns using a laser beam (similar in spirit to a LIDAR), which a thermal camera observes and tracks. Such thermal patterns enable tracking poorly-textured objects whose tracking is highly challenging with standard cameras while not affecting the object's appearance or physical properties. To avail these thermal patterns in existing vision frameworks, we train a network to reverse heat diffusion's effects and remove inconsistent pattern points between different thermal frames. We prototyped and tested this approach on dynamic vision tasks like structure from motion, optical flow, and object tracking of everyday textureless objects.

关键词： 3d reconstruction heat Heat diffusion laser optical flow slam structure from motion thermal tracking

来源：评论

学校读者我要写书评

暂无评论

SocialCounterfactuals: Probing and Mitigating Intersectional Social Biases in vision-Language Models with Counterfactual Examples

SocialCounterfactuals: Probing and Mitigating Intersectional...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Howard, Phillip Madasu, Avinash Le, Tiep Moreno, Gustavo Lujan Bhiwandiwalla, Anahita Lal, Vasudev Intel Labs Santa Clara CA 95052 USA

ISBN: (纸本)9798350353006

While vision-language models (VLMs) have achieved remarkable performance improvements recently, there is growing evidence that these models also posses harmful biases with respect to social attributes such as gender and race. Prior studies have primarily focused on probing such bias attributes individually while ignoring biases associated with intersections between social attributes. This could be due to the difficulty of collecting an exhaustive set of image-text pairs for various combinations of social attributes. To address this challenge, we employ text-to-image diffusion models to produce counterfactual examples for probing intersectional social biases at scale. Our approach utilizes Stable Diffusion with cross attention control to produce sets of counterfactual image-text pairs that are highly similar in their depiction of a subject (e.g., a given occupation) while differing only in their depiction of intersectional social attributes (e.g., race & gender). Through our over-generate-then-filter methodology, we produce SocialCounterfactuals, a high-quality dataset containing 171k image-text pairs for probing intersectional biases related to gender, race, and physical characteristics. We conduct extensive experiments to demonstrate the usefulness of our generated dataset for probing and mitigating intersectional social biases in state-of-the-art VLMs.

关键词： counterfactuals Fairness intersectionality social bias vision-language models

来源：评论

学校读者我要写书评

暂无评论

Sequential Modeling Enables Scalable Learning for Large vision Models

Sequential Modeling Enables Scalable Learning for Large Visi...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Bail, Yutong Geng, Xinyang Mangalam, Karttikeya Bar, Amir Yuille, Alan L. Darrell, Trevor Malik, Jitendra Efros, Alexei A. UC Berkeley BAIR Berkeley CA 94720 USA Johns Hopkins Univ Baltimore MD 21218 USA

ISBN: (纸本)9798350353006

We introduce a novel sequential modeling approach which enables learning a Large vision Model (LVM) without making use of any linguistic data. To do this, we define a common format, "visual sentences", in which we can represent raw images and videos as well as annotated data sources such as semantic segmentations and depth reconstructions with-out needing any meta-knowledge beyond the pixels. Once this wide variety of visual data (comprising 420 billion to-kens) is represented as sequences, the model can be trained to minimize a cross-entropy loss for next token prediction. By training across various scales of model architecture and data diversity, we provide empirical evidence that our models scale effectively. Many different vision tasks can be solved by designing suitable visual prompts at test time.

关键词： pretraining scaling Self-supervised Learning

来源：评论

学校读者我要写书评

暂无评论

OOSTraj: Out-of-Sight Trajectory Prediction With vision-Positioning Denoising

OOSTraj: Out-of-Sight Trajectory Prediction With Vision-Posi...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Zhang, Haichao Xu, Yi Lu, Hongsheng Shimizu, Takayuki Fu, Yun Northeastern Univ 360 Huntington Ave Boston MA 02115 USA Toyota Motor North Amer 465 N Bernardo Ave Mountain View CA 94043 USA

ISBN: (纸本)9798350353006

Trajectory prediction is fundamental in computer vision and autonomous driving, particularly for understanding pedestrian behavior and enabling proactive decision-making. Existing approaches in this field often assume precise and complete observational data, neglecting the challenges associated with out-of-view objects and the noise inherent in sensor data due to limited camera range, physical obstructions, and the absence of ground truth for denoised sensor data. Such oversights are critical safety concerns, as they can result in missing essential, non-visible objects. To bridge this gap, we present a novel method for out-of-sight trajectory prediction that leverages a vision-positioning technique. Our approach denoises noisy sensor observations in an unsupervised manner and precisely maps sensor-based trajectories of out-of-sight objects into visual trajectories. This method has demonstrated state-of-the-art performance in out-of-sight noisy sensor trajectory denoising and prediction on the Vi-Fi and JRDB datasets. By enhancing trajectory prediction accuracy and addressing the challenges of out-of-sight objects, our work significantly contributes to improving the safety and reliability of autonomous driving in complex environments. Our work represents the first initiative towards Out-Of-Sight Trajectory prediction (OOSTraj), setting a new benchmark for future research.

关键词： autonomous driving denosing OOSTraj Out-of-Sight Trajectory Prediction Trajectory Prediction vision-Positioning

来源：评论

学校读者我要写书评

暂无评论

RegionGPT: Towards Region Understanding vision Language Model

RegionGPT: Towards Region Understanding Vision Language Mode...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Guo, Qiushan De Mello, Shalini Yin, Hongxu Byeon, Wonmin Cheung, Ka Chun Yu, Yizhou Luo, Ping Liu, Sifei Univ Hong Kong Hong Kong Peoples R China NVIDIA San Francisco CA USA

ISBN: (纸本)9798350353006

vision language models (VLMs) have experienced rapid advancements through the integration of large language models (LLMs) with image-text pairs, yet they struggle with detailed regional visual understanding due to limited spatial awareness of the vision encoder, and the use of coarse-grained training data that lacks detailed, region-specific captions. To address this, we introduce RegionGPT (short as RGPT), a novel framework designed for complex region-level captioning and understanding. RGPT enhances the spatial awareness of regional representation with simple yet effective modifications to existing visual encoders in VLMs. We further improve performance on tasks requiring a specific output scope by integrating task-guided instruction prompts during both training and inference phases, while maintaining the model's versatility for general-purpose tasks. Additionally, we develop an automated region caption data generation pipeline, enriching the training set with detailed region-level captions. We demonstrate that a universal RGPT model can be effectively applied and significantly enhancing performance across a range of region-level tasks, including but not limited to complex region descriptions, reasoning, object classification, and referring expressions comprehension. Code will be released at the project page.

关键词：

来源：评论

学校读者我要写书评

暂无评论

BIOCLIP: A vision Foundation Model for the Tree of Life

BIOCLIP: A Vision Foundation Model for the Tree of Life

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Stevens, Samuel Wu, Jiaman Thompson, Matthew J. Campolongo, Elizabeth G. Song, Chan Hee Carlyle, David Edward Dong, Li Dahdul, Wasila M. Stewart, Charles Berger-Wolf, Tanya Chao, Wei-Lun Su, Yu Ohio State Univ Columbus OH 43210 USA Microsoft Res Mountain View CA USA Univ Calif Irvine Irvine CA USA Rensselaer Polytech Inst Troy NY USA

ISBN: (纸本)9798350353006

Images of the natural world, collected by a variety of cameras, from drones to individual phones, are increasingly abundant sources of biological information. There is an explosion of computational methods and tools, particularly computer vision, for extracting biologically relevant information from images for science and conservation. Yet most of these are bespoke approaches designed for a specific task and are not easily adaptable or extendable to new questions, contexts, and datasets. A vision model for general organismal biology questions on images is of timely need. To approach this, we curate and release TREEOFLIFE-10M, the largest and most diverse ML-ready dataset of biology images. We then develop BIOCLIP, a foundation model for the tree of life, leveraging the unique properties of biology captured by TREEOFLIFE-10M, namely the abundance and variety of images of plants, animals, and fungi, together with the availability of rich structured biological knowledge. We rigorously benchmark our approach on diverse fine-grained biology classification tasks and find that BIOCLIP consistently and substantially outperforms existing baselines (by 16% to 17% absolute). Intrinsic evaluation reveals that BIOCLIP has learned a hierarchical representation conforming to the tree of life, shedding light on its strong generalizability.(1)

关键词： computer vision evolutionary biology & ecology imageomics machine learning

来源：评论

学校读者我要写书评

暂无评论

Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models

Generative Rendering: Controllable 4D-Guided Video Generatio...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Cai, Shengqu Ceylan, Duygu Gadelha, Matheus Huang, Chun-Hao Paul Wang, Tuanfeng Yang Wetzstein, Gordon Stanford Univ Stanford CA 94305 USA Adobe Res San Francisco CA USA

ISBN: (纸本)9798350353013;9798350353006

Traditional 3D content creation tools empower users to bring their imagination to life by giving them direct control over a scene's geometry, appearance, motion, and camera path. Creating computer-generated videos, however, is a tedious manual process, which can be automated by emerging text-to-video diffusion models. Despite great promise, video diffusion models are difficult to control, hindering a user to apply their own creativity rather than amplifying it. To address this challenge, we present a novel approach that combines the controllability of dynamic 3D meshes with the expressivity and editability of emerging diffusion models. For this purpose, our approach takes an animated, low-fidelity rendered mesh as input and injects the ground truth correspondence information obtained from the dynamic mesh into various stages of a pre-trained text-to-image generation model to output high-quality and temporally consistent frames. We demonstrate our approach on various examples where motion can be obtained by animating rigged assets or changing the camera path. Project page: ***/generative_rendering.

关键词： Animation computer Graphics computer vision Generative Models Video Generation Video Synthesis

来源：评论

学校读者我要写书评

暂无评论

Transcending the Limit of Local Window: Advanced Super-Resolution Transformer with Adaptive Token Dictionary

Transcending the Limit of Local Window: Advanced Super-Resol...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Zhang, Leheng Li, Yawei Zhou, Xingyu Zhao, Xiaorui Gu, Shuhang Univ Elect Sci & Technol China Chengdu Peoples R China Swiss Fed Inst Technol Comp Vis Lab Zurich Switzerland Swiss Fed Inst Technol Integrated Syst Lab Zurich Switzerland

ISBN: (纸本)9798350353013;9798350353006

Single Image Super-Resolution is a classic computer vision problem that involves estimating high-resolution (HR) images from low-resolution (LR) ones. Although deep neural networks (DNNs), especially Transformers for super-resolution, have seen significant advancements in recent years, challenges still remain, particularly in limited receptive field caused by window-based self-attention. To address these issues, we introduce a group of auxiliary Adaptive Token Dictionary to SR Transformer and establish an ATD-SR method. The introduced token dictionary could learn prior information from training data and adapt the learned prior to specific testing image through an adaptive refinement step. The refinement strategy could not only provide global information to all input tokens but also group image tokens into categories. Based on category partitions, we further propose a category-based self-attention mechanism designed to leverage distant but similar tokens for enhancing input features. The experimental results show that our method achieves the best performance on various single image super-resolution benchmarks.

关键词： dictionary learning image super-resolution vision transformer

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 9 10 11 12 13 14 15 16 17 18 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：