检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

11,883 篇 会议
5 篇 期刊文献

馆藏范围

11,888 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

8,055 篇 工学
- 7,613 篇 计算机科学与技术...
- 796 篇 机械工程
- 688 篇 电气工程
- 356 篇 软件工程
- 225 篇 控制科学与工程
- 40 篇 光学工程
- 19 篇 生物工程
- 17 篇 信息与通信工程
- 12 篇 生物医学工程（可授...
- 6 篇 电子科学与技术（可...
- 6 篇 建筑学
- 6 篇 交通运输工程
- 5 篇 仪器科学与技术
- 5 篇 化学工程与技术
- 5 篇 安全科学与工程
- 4 篇 土木工程
3,344 篇 医学
- 3,343 篇 临床医学
- 4 篇 基础医学(可授医学...
- 4 篇 公共卫生与预防医...
250 篇 理学
- 198 篇 系统科学
- 29 篇 物理学
- 21 篇 生物学
- 15 篇 数学
- 9 篇 统计学（可授理学、...
- 4 篇 化学
17 篇 管理学
- 12 篇 管理科学与工程(可...
- 7 篇 图书情报与档案管...
- 5 篇 工商管理
3 篇 法学
- 3 篇 社会学
3 篇 教育学
- 3 篇 教育学
2 篇 农学
1 篇 经济学
1 篇 军事学

主题

5,632 篇 computer vision
2,668 篇 training
2,203 篇 pattern recognit...
1,746 篇 computational mo...
1,502 篇 visualization
1,360 篇 three-dimensiona...
1,074 篇 semantics
999 篇 benchmark testin...
986 篇 codes
959 篇 computer archite...
891 篇 deep learning
777 篇 conferences
754 篇 task analysis
699 篇 feature extracti...
561 篇 transformers
533 篇 face recognition
527 篇 neural networks
495 篇 object detection
490 篇 image segmentati...
468 篇 cameras

机构

174 篇 univ sci & techn...
145 篇 carnegie mellon ...
144 篇 univ chinese aca...
144 篇 tsinghua univ pe...
134 篇 chinese univ hon...
110 篇 zhejiang univ pe...
109 篇 peng cheng lab p...
99 篇 swiss fed inst t...
91 篇 tsinghua univers...
90 篇 shanghai ai lab ...
87 篇 sensetime res pe...
86 篇 shanghai jiao to...
83 篇 zhejiang univers...
82 篇 tech univ munich...
79 篇 university of sc...
79 篇 stanford univ st...
78 篇 univ hong kong p...
77 篇 australian natl ...
76 篇 alibaba grp peop...
75 篇 peng cheng labor...

作者

75 篇 timofte radu
64 篇 van gool luc
50 篇 zhang lei
43 篇 yang yi
37 篇 loy chen change
36 篇 tao dacheng
32 篇 zhou jie
31 篇 chen chen
30 篇 liu yang
30 篇 tian qi
29 篇 sun jian
29 篇 zha zheng-jun
28 篇 li xin
27 篇 qi tian
26 篇 vasconcelos nuno
25 篇 liu xiaoming
25 篇 darrell trevor
24 篇 zheng wei-shi
24 篇 luo ping
24 篇 ying shan

语言

11,862 篇 英文
25 篇 其他
1 篇 中文

检索条件"任意字段=2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024"

共 11888 条记录，以下是61-70 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Honeybee: Locality-enhanced Projector for Multimodal LLM

Honeybee: Locality-enhanced Projector for Multimodal LLM

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Cha, Junbum Kang, Wooyoung Mun, Jonghwan Roh, Byungseok Kakao Brain Seongnam South Korea

ISBN: (纸本)9798350353006

In Multimodal Large Language Models (MLLMs), a visual projector plays a crucial role in bridging pre-trained vision encoders with LLMs, enabling profound visual understanding while harnessing the LLMs' robust capabilities. Despite the importance of the visual projector, it has been relatively less explored. In this study, we first identify two essential projector properties: (i) flexibility in managing the number of visual tokens, crucial for MLLMs' overall efficiency, and (ii) preservation of local context from visual features, vital for spatial understanding. Based on these findings, we propose a novel projector design that is both flexible and locality-enhanced, effectively satisfying the two desirable properties. Additionally, we present comprehensive strategies to effectively utilize multiple and multifaceted instruction datasets. Through extensive experiments, we examine the impact of individual design choices. Finally, our proposed MLLM, Honeybee, remarkably outperforms previous state-of-the-art methods across various benchmarks, including MME, MMBench, SEED-Bench, and LLaVA-Bench, achieving significantly higher efficiency. Code and models are available at https://github. com/kakaobrain/honeybee.

关键词： Multimodal LLM vision-Language

来源：评论

学校读者我要写书评

暂无评论

Making vision Transformers Truly Shift-Equivariant

Making Vision Transformers Truly Shift-Equivariant

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Rojas-Gomez, Renan A. Lim, Teck-Yian Do, Minh N. Yeh, Raymond A. UIUC Dept Elect Engn Urbana IL 61801 USA UIUC VinUni Illinois Smart Hlth Ctr Urbana IL USA Purdue Univ Dept Comp Sci W Lafayette IN 47907 USA

ISBN: (纸本)9798350353013;9798350353006

In the field of computer vision, vision Transformers (ViTs) have emerged as a prominent deep learning architecture. Despite being inspired by Convolutional Neural Networks (CNNs), ViTs are susceptible to small spatial shifts in the input data - they lack shift-equivariance. To address this shortcoming, we introduce novel data-adaptive designs for each of the ViT modules that break shift-equivariance, such as tokenization, self-attention, patch merging, and positional encoding. With our proposed modules, we achieve perfect circular shift-equivariance across four prominent ViT architectures: Swin, SwinV2, CvT, and MViTv2. Additionally, we leverage our design to further enhance consistency under standard shifts. We evaluate our adaptive ViT models on image classification and semantic segmentation tasks. Our models achieve competitive performance across three diverse datasets, showcasing perfect (100%) circular shift consistency while improving standard shift consistency.(1)

关键词： shift equivariance shift invariance vision transformers

来源：评论

学校读者我要写书评

暂无评论

SpikingResformer: Bridging ResNet and vision Transformer in Spiking Neural Networks

SpikingResformer: Bridging ResNet and Vision Transformer in ...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Shi, Xinyu Hao, Zecheng Yu, Zhaofei Peking Univ Inst Artificial Intelligence Beijing Peoples R China Peking Univ Sch Comp Sci Beijing Peoples R China

ISBN: (纸本)9798350353013;9798350353006

The remarkable success of vision Transformers in Artificial Neural Networks (ANNs) has led to a growing interest in incorporating the self-attention mechanism and transformer-based architecture into Spiking Neural Networks (SNNs). While existing methods propose spiking self-attention mechanisms that are compatible with SNNs, they lack reasonable scaling methods, and the overall architectures proposed by these methods suffer from a bottleneck in effectively extracting local features. To address these challenges, we propose a novel spiking self-attention mechanism named Dual Spike Self-Attention (DSSA) with a reasonable scaling method. Based on DSSA, we propose a novel spiking vision Transformer architecture called SpikingResformer, which combines the ResNet-based multi-stage architecture with our proposed DSSA to improve both performance and energy efficiency while reducing parameters. Experimental results show that SpikingResformer achieves higher accuracy with fewer parameters and lower energy consumption than other spiking vision Transformer counterparts. Notably, our SpikingResformer-L achieves 79.40% top-1 accuracy on ImageNet with 4 time-steps, which is the state-of-the-art result in the SNN field. Codes are available at https://***/xyshi2000/SpikingResformer

关键词： Spiking Neural Networks vision Transformer

来源：评论

学校读者我要写书评

暂无评论

Intrinsic Image Diffusion for Indoor Single-view Material Estimation

Intrinsic Image Diffusion for Indoor Single-view Material Es...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Kocsis, Peter Sitzmann, Vincent Niessner, Matthias Tech Univ Munich Munich Germany MIT EECS Cambridge MA 02139 USA

ISBN: (纸本)9798350353013;9798350353006

We present Intrinsic Image Diffusion, a generative model for appearance decomposition of indoor scenes. Given a single input view, we sample multiple possible material explanations represented as albedo, roughness, and metallic maps. Appearance decomposition poses a considerable challenge in computer vision due to the inherent ambiguity between lighting and material properties and the lack of real datasets. To address this issue, we advocate for a probabilistic formulation, where instead of attempting to directly predict the true material properties, we employ a conditional generative model to sample from the solution space. Furthermore, we show that utilizing the strong learned prior of recent diffusion models trained on large-scale real-world images can be adapted to material estimation and highly improves the generalization to real images. Our method produces significantly sharper, more consistent, and more detailed materials, outperforming state-of-the-art methods by 1.5dB on PSNR and by 45% better FID score on albedo prediction. We demonstrate the effectiveness of our approach through experiments on both synthetic and real-world datasets.

关键词： Appearance Decompostion computer vision Deep Learning Diffusion Graphics Lighting Estimation Material Estimation

来源：评论

学校读者我要写书评

暂无评论

Bi-Causal: Group Activity recognition via Bidirectional Causality

Bi-Causal: Group Activity Recognition via Bidirectional Caus...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Zhang, Youliang Liu, Wenxuan Xu, Danni Zhou, Zhuo Wang, Zheng Wuhan Univ Natl Engn Res Ctr Multimedia Software Sch Comp Sci Inst Artificial Intelligence Wuhan Hubei Peoples R China Hubei Key Lab Multimedia & Network Commun Engn Wuhan Hubei Peoples R China Wuhan Univ Technol Wuhan Hubei Peoples R China Natl Univ Singapore Singapore Singapore

ISBN: (纸本)9798350353013;9798350353006

Current approaches in Group Activity recognition (GAR) predominantly emphasize Human Relations (HRs) while often neglecting the impact of Human-Object Interactions (HOIs). This study prioritizes the consideration of both HRs and HOIs, emphasizing their interdependence. Notably, employing Granger Causality Tests reveals the presence of bidirectional causality between HRs and HOIs. Leveraging this insight, we propose a Bidirectional-Causal GAR network. This network establishes a causality communication channel while modeling relations and interactions, enabling reciprocal enhancement between human-object interactions and human relations, ensuring their mutual consistency. Additionally, an Interaction Module is devised to effectively capture the dynamic nature of human-object interactions. Comprehensive experiments conducted on two publicly available datasets showcase the superiority of our proposed method over state-of-the-art approaches. Our project page: https://***/***/

关键词： Bidirectional causality Group activity recognition Human-object relation

来源：评论

学校读者我要写书评

暂无评论

Hybrid Functional Maps for Crease-Aware Non-Isometric Shape Matching

Hybrid Functional Maps for Crease-Aware Non-Isometric Shape ...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Bastian, Lennart Xie, Yizheng Navab, Nassir Laehner, Zorah Tech Univ Munich Munich Germany Univ Siegen Siegen Germany Univ Bonn Bonn Germany Lamarr Inst Bonn Germany

ISBN: (纸本)9798350353013;9798350353006

Non-isometric shape correspondence remains a fundamental challenge in computer vision. Traditional methods using Laplace-Beltrami operator (LBO) eigenmodes face limitations in characterizing high-frequency extrinsic shape changes like bending and creases. We propose a novel approach of combining the non-orthogonal extrinsic basis of eigenfunctions of the elastic thin-shell hessian with the intrinsic ones of the LBO, creating a hybrid spectral space in which we construct functional maps. To this end, we present a theoretical framework to effectively integrate non-orthogonal basis functions into descriptor- and learning-based functional map methods. Our approach can be incorporated easily into existing functional map pipelines across varying applications and can handle complex deformations beyond isometries. We show extensive evaluations across various supervised and unsupervised settings and demonstrate significant improvements. Notably, our approach achieves up to 15% better mean geodesic error for non-isometric correspondence settings and up to 45% improvement in scenarios with topological noise. Code is available at: https://***/

关键词： computer vision Functional Maps Non-isometric Shape Correspondence Shape Matching Topological Noise

来源：评论

学校读者我要写书评

暂无评论

On the Robustness of Language Guidance for Low-Level vision Tasks: Findings from Depth Estimation

On the Robustness of Language Guidance for Low-Level Vision ...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Chatterjee, Agneet Gokhale, Tejas Baral, Chitta Yang, Yezhou Arizona State Univ Tempe AZ 85281 USA Univ Maryland Baltimore Cty Baltimore MD 21228 USA

ISBN: (纸本)9798350353013;9798350353006

Recent advances in monocular depth estimation have been made by incorporating natural language as additional guidance. Although yielding impressive results, the impact of the language prior, particularly in terms of generalization and robustness, remains unexplored. In this paper, we address this gap by quantifying the impact of this prior and introduce methods to benchmark its effectiveness across various settings. We generate "low-level" sentences that convey object-centric, three-dimensional spatial relationships, incorporate them as additional language priors and evaluate their downstream impact on depth estimation. Our key finding is that current language-guided depth estimators perform optimally only with scene-level descriptions and counter-intuitively fare worse with low level descriptions. Despite leveraging additional data, these methods are not robust to directed adversarial attacks and decline in performance with an increase in distribution shift. Finally, to provide a foundation for future research, we identify points of failures and offer insights to better understand these shortcomings. With an increasing number of methods using language for depth estimation, our findings highlight the opportunities and pitfalls that require careful consideration for effective deployment in real-world settings. (1)

关键词： Low-level vision robustness vision and language

来源：评论

学校读者我要写书评

暂无评论

Question Aware vision Transformer for Multimodal Reasoning

Question Aware Vision Transformer for Multimodal Reasoning

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Ganz, Roy Kittenplont, Yair Aberdam, Aviad Ben Avraham, Elad Nuriel, Oren Mazor, Shai Litmant, Ron Technion Haifa Israel AWS AI Labs Seattle WA 98019 USA

ISBN: (纸本)9798350353006

vision-Language (VL) models have gained significant research focus, enabling remarkable advances in multimodal reasoning. These architectures typically comprise a vision encoder, a Large Language Model (LLM), and a projection module that aligns visual features with the LLM's representation space. Despite their success, a critical limitation persists: the vision encoding process remains decoupled from user queries, often in the form of image-related questions. Consequently, the resulting visual features may not be optimally attuned to the query-specific elements of the image. To address this, we introduce QA-ViT, a Question Aware vision Transformer approach for multimodal reasoning, which embeds question awareness directly within the vision encoder. This integration results in dynamic visual features focusing on relevant image aspects to the posed question. QA-ViT is model-agnostic and can be incorporated efficiently into any VL architecture. Extensive experiments demonstrate the effectiveness of applying our method to various multimodal architectures, leading to consistent improvement across diverse tasks and showcasing its potential for enhancing visual and scene-text understanding.

关键词：

来源：评论

学校读者我要写书评

暂无评论

PointInfinity: Resolution-Invariant Point Diffusion Models

PointInfinity: Resolution-Invariant Point Diffusion Models

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Huang, Zixuan Johnson, Justin Debnath, Shoubhik Rehg, James M. Wu, Chao-Yuan Meta FAIR Menlo Pk CA 94025 USA Univ Illinois Champaign IL 61820 USA

ISBN: (纸本)9798350353006

We present PointInfinity, an efficient family of point cloud diffusion models. Our core idea is to use a transformer-based architecture with a fixed-size, resolution-invariant latent representation. This enables efficient training with low-resolution point clouds, while allowing high-resolution point clouds to be generated during inference. More importantly, we show that scaling the test-time resolution beyond the training resolution improves the fidelity of generated point clouds and surfaces. We analyze this phenomenon and draw a link to classifier-free guidance commonly used in diffusion models, demonstrating that both allow trading off fidelity and variability during inference. Experiments on CO3D show that PointInfinity can efficiently generate high-resolution point clouds (up to 131k points, 31 more than Point-E) with state-of-the-art quality.

关键词： 3D Diffusion Model 3D Generation 3D reconstruction 3D vision computer vision Deep Learning

来源：评论

学校读者我要写书评

暂无评论

Can Biases in ImageNet Models Explain Generalization?

Can Biases in ImageNet Models Explain Generalization?

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Gavrikov, Paul Keuper, Janis Offenburg Univ IMLA Offenburg Germany Univ Mannheim Mannheim Germany

ISBN: (纸本)9798350353006

The robust generalization of models to rare, in-distribution (ID) samples drawn from the long tail of the training distribution and to out-of-training-distribution (OOD) samples is one of the major challenges of current deep learning methods. For image classification, this manifests in the existence of adversarial attacks, the performance drops on distorted images, and a lack of generalization to concepts such as sketches. The current understanding of generalization in neural networks is very limited, but some biases that differentiate models from human vision have been identified and might be causing these limitations. Consequently, several attempts with varying success have been made to reduce these biases during training to improve generalization. We take a step back and sanity-check these attempts. Fixing the architecture to the well-established ResNet-50, we perform a large-scale study on 48 ImageNet models obtained via different training methods to understand how and if these biases - including shape bias, spectral biases, and critical bands - interact with generalization. Our extensive study results reveal that contrary to previous findings, these biases are insufficient to accurately predict the generalization of a model holistically. We provide access to all checkpoints and evaluation code at https://***/paulgavrikov/biases_vs_generalization/

关键词： bias computer vision generalization imagenet robustness

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 3 4 5 6 7 8 9 10 11 12 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：