检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

20,798 篇 会议
88 篇 期刊文献
65 册 图书

馆藏范围

20,950 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

13,275 篇 工学
- 10,923 篇 计算机科学与技术...
- 2,484 篇 机械工程
- 2,307 篇 软件工程
- 913 篇 光学工程
- 771 篇 电气工程
- 556 篇 控制科学与工程
- 405 篇 信息与通信工程
- 210 篇 测绘科学与技术
- 131 篇 生物医学工程（可授...
- 104 篇 电子科学与技术（可...
- 100 篇 生物工程
- 92 篇 仪器科学与技术
- 56 篇 化学工程与技术
- 52 篇 建筑学
- 48 篇 土木工程
- 44 篇 安全科学与工程
- 38 篇 力学（可授工学、理...
- 38 篇 航空宇航科学与技...
- 35 篇 交通运输工程
3,457 篇 医学
- 3,449 篇 临床医学
- 34 篇 基础医学(可授医学...
2,315 篇 理学
- 1,154 篇 数学
- 1,132 篇 物理学
- 417 篇 统计学（可授理学、...
- 386 篇 生物学
- 252 篇 系统科学
- 57 篇 化学
353 篇 管理学
- 184 篇 图书情报与档案管...
- 176 篇 管理科学与工程(可...
- 32 篇 工商管理
28 篇 法学
20 篇 农学
15 篇 教育学
9 篇 经济学
8 篇 艺术学
5 篇 文学
5 篇 军事学

主题

8,203 篇 computer vision
3,010 篇 pattern recognit...
2,732 篇 training
1,769 篇 computational mo...
1,657 篇 visualization
1,483 篇 cameras
1,415 篇 shape
1,369 篇 three-dimensiona...
1,369 篇 face recognition
1,285 篇 image segmentati...
1,272 篇 feature extracti...
1,178 篇 robustness
1,090 篇 semantics
1,040 篇 layout
1,007 篇 object detection
975 篇 object recogniti...
969 篇 computer science
946 篇 computer archite...
946 篇 benchmark testin...
931 篇 codes

机构

174 篇 univ sci & techn...
154 篇 carnegie mellon ...
148 篇 univ chinese aca...
144 篇 chinese univ hon...
113 篇 microsoft resear...
103 篇 zhejiang univ pe...
99 篇 swiss fed inst t...
97 篇 tsinghua univ pe...
93 篇 tsinghua univers...
91 篇 microsoft res as...
88 篇 shanghai ai lab ...
81 篇 zhejiang univers...
76 篇 alibaba grp peop...
74 篇 hong kong univ s...
73 篇 university of sc...
72 篇 peking univ peop...
69 篇 university of ch...
68 篇 shanghai jiao to...
66 篇 google res mount...
66 篇 univ oxford oxfo...

作者

80 篇 van gool luc
71 篇 zhang lei
59 篇 timofte radu
48 篇 yang yi
47 篇 xiaoou tang
44 篇 darrell trevor
43 篇 tian qi
43 篇 luc van gool
42 篇 loy chen change
42 篇 sun jian
42 篇 li fei-fei
40 篇 qi tian
39 篇 li stan z.
37 篇 liu yang
37 篇 chen xilin
36 篇 shan shiguang
35 篇 liu xiaoming
35 篇 vasconcelos nuno
35 篇 torralba antonio
32 篇 zhou jie

语言

20,928 篇 英文
14 篇 中文
6 篇 其他
2 篇 日文
2 篇 土耳其文

检索条件"任意字段=2009 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009"

共 20951 条记录，以下是101-110 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Telling Left from Right: Identifying Geometry-Aware Semantic Correspondence

Telling Left from Right: Identifying Geometry-Aware Semantic...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Zhang, Junyi Herrmann, Charles Hur, Junhwa Chen, Eric Jampani, Varun Sun, Deqing Yang, Ming-Hsuan Shanghai Jiao Tong Univ Shanghai Peoples R China Google Res Mountain View CA USA UIUC Champaign IL USA Stabil AI London England UC Merced Merced CA USA

ISBN: (纸本)9798350353013;9798350353006

While pre-trained large-scale vision models have shown significant promise for semantic correspondence, their features often struggle to grasp the geometry and orientation of instances. This paper identifies the importance of being geometry-aware for semantic correspondence and reveals a limitation of the features of current foundation models under simple post-processing. We show that incorporating this information can markedly enhance semantic correspondence performance with simple but effective solutions in both zero-shot and supervised settings. We also construct a new challenging benchmark for semantic correspondence built from an existing animal pose estimation dataset, for both pre-training validating models. Our method achieves a PCK@0.10 score of 65.4 (zero-shot) and 85.6 (supervised) on the challenging SPair-71k dataset, surpassing the state of the art by 5.5p and 11.0p absolute gains, respectively. Our code and datasets are publicly available at: https://***

关键词： diffusion models semantic correspondence vision transformer

来源：评论

学校读者我要写书评

暂无评论

Resource-Efficient Transformer Pruning for Finetuning of Large Models

Resource-Efficient Transformer Pruning for Finetuning of Lar...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Ilhan, Fatih Su, Gong Tekin, Selim Furkan Huang, Tiansheng Hu, Sihao Liu, Ling Georgia Inst Technol Atlanta GA 30332 USA IBM Res Yorktown Hts NY USA

ISBN: (纸本)9798350353006

With the recent advances in vision transformers and large language models (LLMs), finetuning costly large models on downstream learning tasks poses significant challenges under limited computational resources. This paper presents a REsource and ComputAtion-efficient Pruning framework (RECAP) for the finetuning of transformer-based large models. RECAP by design bridges the gap between efficiency and performance through an iterative process cycling between pruning, finetuning, and updating stages to explore different chunks of the given large-scale model. At each iteration, we first prune the model with Taylor-approximation-based importance estimation and then only update a subset of the pruned model weights based on the Fisher-information criterion. In this way, RECAP achieves two synergistic and yet conflicting goals: reducing the GPU memory footprint while maintaining model performance, unlike most existing pruning methods that require the model to be finetuned beforehand for better preservation of model performance. We perform extensive experiments with a wide range of large transformer-based architectures on various computer vision and natural language understanding tasks. Compared to recent pruning techniques, we demonstrate that RECAP offers significant improvements in GPU memory efficiency, capable of reducing the footprint by up to 65%.

关键词： efficient finetuning pruning vision transformers

来源：评论

学校读者我要写书评

暂无评论

Neural Refinement for Absolute Pose Regression with Feature Synthesis

Neural Refinement for Absolute Pose Regression with Feature ...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Chen, Shuai Bhalgat, Yash Li, Xinghui Bin, Jia-Wang Li, Kejie Wang, Zirui Prisacariu, Victor Adrian Univ Oxford Act Vision Lab Oxford England Univ Oxford Visual Geometry Grp Oxford England

ISBN: (纸本)9798350353006

Absolute Pose Regression (APR) methods use deep neural networks to directly regress camera poses from RGB images. However, the predominant APR architectures only rely on 2D operations during inference, resulting in limited accuracy of pose estimation due to the lack of 3D geometry constraints or priors. In this work, we propose a test-time refinement pipeline that leverages implicit geometric constraints using a robust feature field to enhance the ability of APR methods to use 3D information during inference. We also introduce a novel Neural Feature Synthesizer (NeFeS) model, which encodes 3D geometric features during training and directly renders dense novel view features at test time to refine APR methods. To enhance the robustness of our model, we introduce a feature fusion module and a progressive training strategy. Our proposed method achieves state-of- the-art single-image APR accuracy on indoor and outdoor datasets. Code will be released at https:// ***/ActivevisionLab/NeFeS.

关键词： Feature Distillation Neural Radiance Field Pose Regression Test-time Refinement Visual Re-Localization

来源：评论

学校读者我要写书评

暂无评论

Intrinsic Image Diffusion for Indoor Single-view Material Estimation

Intrinsic Image Diffusion for Indoor Single-view Material Es...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Kocsis, Peter Sitzmann, Vincent Niessner, Matthias Tech Univ Munich Munich Germany MIT EECS Cambridge MA 02139 USA

ISBN: (纸本)9798350353013;9798350353006

We present Intrinsic Image Diffusion, a generative model for appearance decomposition of indoor scenes. Given a single input view, we sample multiple possible material explanations represented as albedo, roughness, and metallic maps. Appearance decomposition poses a considerable challenge in computer vision due to the inherent ambiguity between lighting and material properties and the lack of real datasets. To address this issue, we advocate for a probabilistic formulation, where instead of attempting to directly predict the true material properties, we employ a conditional generative model to sample from the solution space. Furthermore, we show that utilizing the strong learned prior of recent diffusion models trained on large-scale real-world images can be adapted to material estimation and highly improves the generalization to real images. Our method produces significantly sharper, more consistent, and more detailed materials, outperforming state-of-the-art methods by 1.5dB on PSNR and by 45% better FID score on albedo prediction. We demonstrate the effectiveness of our approach through experiments on both synthetic and real-world datasets.

关键词： Appearance Decompostion computer vision Deep Learning Diffusion Graphics Lighting Estimation Material Estimation

来源：评论

学校读者我要写书评

暂无评论

Physical Property Understanding from Language-Embedded Feature Fields

Physical Property Understanding from Language-Embedded Featu...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Zhai, Albert J. Shen, Yuan Chen, Emily Y. Wang, Gloria X. Wang, Xinlei Wang, Sheng Guan, Kaiyu Wang, Shenlong Univ Illinois Champaign IL 61820 USA

ISBN: (纸本)9798350353006

Can computers perceive the physical properties of objects solely through vision? Research in cognitive science and vision science has shown that humans excel at identifying materials and estimating their physical properties based purely on visual appearance. In this paper, we present a novel approach for dense prediction of the physical properties of objects using a collection of images. Inspired by how humans reason about physics through vision, we leverage large language models to propose candidate materials for each object. We then construct a language-embedded point cloud and estimate the physical properties of each 3D point using a zero-shot kernel regression approach. Our method is accurate, annotation-free, and applicable to any object in the open world. Experiments demonstrate the effectiveness of the proposed approach in various physical property reasoning tasks, such as estimating the mass of common objects, as well as other properties like friction and hardness. Code is available at https://***/NeRF2Physics.

关键词： 3D scene understanding digital twin physical properties vision and language

来源：评论

学校读者我要写书评

暂无评论

Learning by Correction: Efficient Tuning Task for Zero-Shot Generative vision-Language Reasoning

Learning by Correction: Efficient Tuning Task for Zero-Shot ...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Li, Rongjie Wu, Yu He, Xuming ShanghaiTech Univ Sch Informat Sci & Technol Shanghai Peoples R China Shanghai Engn Res Ctr Intelligent Vis & Imaging Shanghai Peoples R China

ISBN: (纸本)9798350353006

Generative vision-language models (VLMs) have shown impressive performance in zero-shot vision-language tasks like image captioning and visual question answering. However, improving their zero-shot reasoning typically requires second-stage instruction tuning, which relies heavily on human-labeled or large language model-generated annotation, incurring high labeling costs. To tackle this challenge, we introduce Image-Conditioned Caption Correction (ICCC), a novel pre-training task designed to enhance VLMs' zero-shot performance without the need for labeled task-aware data. The ICCC task compels VLMs to rectify mismatches between visual and language concepts, thereby enhancing instruction following and text generation conditioned on visual inputs. Leveraging language structure and a lightweight dependency parser, we construct data samples of ICCC task from image-text datasets with low labeling and computation costs. Experimental results on BLIP2 and InstructBLIP demonstrate significant improvements in zero-shot image-text generation-based VL tasks through ICCC instruction tuning.

关键词： Multimodal Reasoning vision-Language

来源：评论

学校读者我要写书评

暂无评论

VicTR: Video-conditioned Text Representations for Activity recognition

VicTR: Video-conditioned Text Representations for Activity R...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Kahatapitiya, Kumara Arnab, Anurag Nagrani, Arsha Ryoo, Michael S. SUNY Stony Brook Stony Brook NY 11794 USA Google Res Mountain View CA USA

ISBN: (纸本)9798350353006

vision-Language models (VLMs) have excelled in the image-domain- especially in zero-shot settings- thanks to the availability of vast pretraining data (i.e., paired image-text samples). However for videos, such paired data is not as abundant. Therefore, video- VLMs are usually designed by adapting pretrained image- VLMs to the video-domain, instead of training from scratch. All such recipes rely on aug-menting visual embeddings with temporal information (i.e., image -+ video), often keeping text embeddings unchanged or even being discarded. In this paper, we argue the contrary, that better video- VLMs can be designed by focusing more on augmenting text, rather than visual information. More specifically, we introduce Video-conditioned Text Representations (Vi c TR): a form of text embeddings optimized w.r.t. vi-sual embeddings, creating a more-flexible contrastive latent space. Our model canfurther make use offreely-available semantic information, in the form of visually- grounded aux-iliary text (e.g. object or scene information). We evaluate our model on few-shot, zero-shot (HMDB-51, UCF-10l), short-form (Kinetics-400) and long-form (Charades) activ-ity recognition benchmarks, showing strong performance among video-VLMs.

关键词： Activity recognition Video Understanding Video-conditioned Text vision-language models

来源：评论

学校读者我要写书评

暂无评论

Bi-Causal: Group Activity recognition via Bidirectional Causality

Bi-Causal: Group Activity Recognition via Bidirectional Caus...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Zhang, Youliang Liu, Wenxuan Xu, Danni Zhou, Zhuo Wang, Zheng Wuhan Univ Natl Engn Res Ctr Multimedia Software Sch Comp Sci Inst Artificial Intelligence Wuhan Hubei Peoples R China Hubei Key Lab Multimedia & Network Commun Engn Wuhan Hubei Peoples R China Wuhan Univ Technol Wuhan Hubei Peoples R China Natl Univ Singapore Singapore Singapore

ISBN: (纸本)9798350353013;9798350353006

Current approaches in Group Activity recognition (GAR) predominantly emphasize Human Relations (HRs) while often neglecting the impact of Human-Object Interactions (HOIs). This study prioritizes the consideration of both HRs and HOIs, emphasizing their interdependence. Notably, employing Granger Causality Tests reveals the presence of bidirectional causality between HRs and HOIs. Leveraging this insight, we propose a Bidirectional-Causal GAR network. This network establishes a causality communication channel while modeling relations and interactions, enabling reciprocal enhancement between human-object interactions and human relations, ensuring their mutual consistency. Additionally, an Interaction Module is devised to effectively capture the dynamic nature of human-object interactions. Comprehensive experiments conducted on two publicly available datasets showcase the superiority of our proposed method over state-of-the-art approaches. Our project page: https://***/***/

关键词： Bidirectional causality Group activity recognition Human-object relation

来源：评论

学校读者我要写书评

暂无评论

Action Scene Graphs for Long-Form Understanding of Egocentric Videos

Action Scene Graphs for Long-Form Understanding of Egocentri...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Rodin, Ivan Furnari, Antonino Min, Kyle Tripathi, Subarna Farinella, Giovanni Maria Univ Catania Catania Italy Intel Labs Hillsboro OR USA

ISBN: (纸本)9798350353006

We present Egocentric Action Scene Graphs (EASGs), a new representation for long-form understanding of egocentric videos. EASGs extend standard manually-annotated representations of egocentric videos, such as verb-noun action labels, by providing a temporally evolving graph-based description of the actions performed by the camera wearer, including interacted objects, their relationships, and how actions unfold in time. Through a novel annotation procedure, we extend the Ego4D dataset adding manually labeled Egocentric Action Scene Graphs which offer a rich set of annotations for long-from egocentric video understanding. We hence define the EASG generation task and provide a baseline approach, establishing preliminary benchmarks. Experiments on two downstream tasks, action anticipation and activity summarization, highlight the effectiveness of EASGs for long-form egocentric video understanding. We will release the dataset and code to replicate experiments and annotations 1 1 The code is available at https://***/fpv-iplab/EASG.

关键词： egocentric vision long-form video understanding scene graphs

来源：评论

学校读者我要写书评

暂无评论

Probing the 3D Awareness of Visual Foundation Models

Probing the 3D Awareness of Visual Foundation Models

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： El Banani, Mohamed Raj, Amit Maninis, Kevis-Kokitsi Kar, Abhishek Li, Yuanzhen Rubinstein, Michael Sun, Deqing Guibas, Leonidas Johnson, Justin Jampani, Varun Univ Michigan Ann Arbor MI 48109 USA Google Mountain View CA 94043 USA Stability AI London ON Canada

ISBN: (纸本)9798350353006

Recent advances in large-scale pretraining have yielded visual foundation models with strong capabilities. Not only can recent models generalize to arbitrary images for their training task, their intermediate representations are useful for other visual tasks such as detection and segmentation. Given that such models can classify, delineate, and localize objects in 2D, we ask whether they also represent their 3D structure? In this work, we analyze the 3D awareness of visual foundation models. We posit that 3D awareness implies that representations (1) encode the 3D structure of the scene and (2) consistently represent the surface across views. We conduct a series of experiments using task-specific probes and zero-shot inference procedures on frozen features. Our experiments reveal several limitations of the current models. Our code and analysis can be found at https://***/mbanani/probe3d.

关键词： 3D Awareness 3D vision Foundation Models Representation Learning

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 7 8 9 10 11 12 13 14 15 16 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：