检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

23,136 篇 会议
90 篇 期刊文献
15 册 图书

馆藏范围

23,240 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

13,631 篇 工学
- 11,162 篇 计算机科学与技术...
- 3,338 篇 软件工程
- 2,414 篇 机械工程
- 1,663 篇 光学工程
- 1,203 篇 电气工程
- 973 篇 控制科学与工程
- 738 篇 信息与通信工程
- 381 篇 仪器科学与技术
- 322 篇 生物工程
- 239 篇 生物医学工程（可授...
- 188 篇 电子科学与技术（可...
- 109 篇 化学工程与技术
- 104 篇 安全科学与工程
- 99 篇 测绘科学与技术
- 85 篇 建筑学
- 83 篇 交通运输工程
- 82 篇 土木工程
- 56 篇 力学（可授工学、理...
3,696 篇 医学
- 3,684 篇 临床医学
- 76 篇 基础医学(可授医学...
3,138 篇 理学
- 1,880 篇 物理学
- 1,605 篇 数学
- 547 篇 统计学（可授理学、...
- 466 篇 生物学
- 243 篇 系统科学
- 107 篇 化学
491 篇 管理学
- 290 篇 图书情报与档案管...
- 212 篇 管理科学与工程(可...
- 74 篇 工商管理
252 篇 艺术学
- 251 篇 设计学（可授艺术学...
58 篇 法学
38 篇 农学
25 篇 教育学
19 篇 经济学
10 篇 军事学
3 篇 文学

主题

10,395 篇 computer vision
3,892 篇 pattern recognit...
3,101 篇 training
2,104 篇 computational mo...
1,898 篇 visualization
1,799 篇 cameras
1,487 篇 feature extracti...
1,475 篇 three-dimensiona...
1,464 篇 shape
1,447 篇 image segmentati...
1,287 篇 robustness
1,234 篇 computer archite...
1,213 篇 semantics
1,112 篇 benchmark testin...
1,111 篇 conferences
1,104 篇 layout
1,092 篇 object detection
1,084 篇 computer science
1,026 篇 codes
907 篇 face recognition

机构

137 篇 univ sci & techn...
124 篇 univ chinese aca...
121 篇 chinese univ hon...
108 篇 tsinghua univers...
108 篇 carnegie mellon ...
105 篇 microsoft resear...
97 篇 zhejiang univ pe...
91 篇 swiss fed inst t...
85 篇 university of sc...
84 篇 zhejiang univers...
81 篇 shanghai ai lab ...
79 篇 university of ch...
75 篇 shanghai jiao to...
69 篇 microsoft res as...
68 篇 alibaba grp peop...
66 篇 adobe research
65 篇 national laborat...
64 篇 peking univ peop...
61 篇 univ oxford oxfo...
59 篇 peng cheng labor...

作者

80 篇 van gool luc
71 篇 timofte radu
65 篇 zhang lei
43 篇 luc van gool
40 篇 yang yi
37 篇 loy chen change
34 篇 li stan z.
33 篇 liu yang
33 篇 xiaoou tang
33 篇 murino vittorio
33 篇 chen chen
33 篇 qi tian
33 篇 li fei-fei
32 篇 tian qi
32 篇 sun jian
30 篇 ying shan
30 篇 pascal fua
29 篇 darrell trevor
28 篇 li xin
28 篇 hanqing lu

语言

23,148 篇 英文
66 篇 其他
20 篇 中文
5 篇 土耳其文
2 篇 日文

检索条件"任意字段=IEEE/CVF Conference on Computer Vision and Pattern Recognition"

共 23241 条记录，以下是61-70 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

Contrasting intra-modal and ranking cross-modal hard negatives to enhance visio-linguistic compositional understanding

Contrasting intra-modal and ranking cross-modal hard negativ...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Zhang, Le Awal, Rabiul Agrawal, Aishwarya Mila Quebec AI Inst Montreal PQ Canada Univ Montreal Montreal PQ Canada Canada CIFAR AI Chair Montreal PQ Canada

ISBN: (纸本)9798350353006

vision-Language Models (VLMs), such as CLIP, exhibit strong image-text comprehension abilities, facilitating advances in several downstream tasks such as zero-shot image classification, image-text retrieval, and text-to-image generation. However, the compositional reasoning abilities of existing VLMs remains subpar. The root of this limitation lies in the inadequate alignment between the images and captions in the pretraining datasets. Additionally, the current contrastive learning objective fails to focus on fine-grained grounding components like relations, actions, and attributes, resulting in "bag-of-words" representations. We introduce a simple and effective method to improve compositional reasoning in VLMs. Our method better leverages available datasets by refining and expanding the standard image-text contrastive learning framework. Our approach does not require specific annotations and does not incur extra parameters. When integrated with CLIP, our technique yields notable improvement over state-of-the-art baselines across five vision-language compositional benchmarks.(1)

关键词： compositional understanding contrastive learning vision-language models

来源：评论

学校读者我要写书评

暂无评论

Generating Diverse Agricultural Data for vision-Based Farming Applications

Generating Diverse Agricultural Data for Vision-Based Farmin...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Cieslak, Mikolaj Govindarajan, Umabharathi Garcia, Alejandro Chandrashekar, Anuradha Haedrich, Torsten Mendoza-Drosik, Aleksander Michels, Dominik L. Pirk, Soeren Fu, Chia-Chun Palubicki, Wojciech GreenMatterAI Berlin Germany Blue River Technol Santa Clara CA USA King Abdullah Univ Sci & Technol Thuwal Saudi Arabia Tech Univ Darmstadt Darmstadt Germany Christian Albrecht Univ Kiel Kiel Germany Adam Mickiewicz Univ Poznan Poland

ISBN: (纸本)9798350365474

We present a specialized procedural model for generating synthetic agricultural scenes, focusing on soybean crops, along with various weeds. The model simulates distinct growth stages of these plants, diverse soil conditions, and randomized field arrangements under varying lighting conditions. The integration of real-world textures and environmental factors into the procedural generation process enhances the photorealism and applicability of the synthetic data. We validate our model's effectiveness by comparing the synthetic data against real agricultural images, demonstrating its potential to significantly augment training data for machine learning models in agriculture. This approach not only provides a cost-effective solution for generating high-quality, diverse data but also addresses specific needs in agricultural vision tasks that are not fully covered by general-purpose models.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Learning Correlation Structures for vision Transformers

Learning Correlation Structures for Vision Transformers

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Kim, Manjin Seo, Paul Hongsuck Schmid, Cordelia Cho, Minsu POSTECH Pohang South Korea Korea Univ Seoul South Korea Google Res Mountain View CA USA

ISBN: (纸本)9798350353006

We introduce a new attention mechanism, dubbed structural self-attention (StructSA), that leverages rich correlation patterns naturally emerging in key-query interactions of attention. StructSA generates attention maps by recognizing space-time structures of key-query correlations via convolution and uses them to dynamically aggregate local contexts of value features. This effectively leverages rich structural patterns in images and videos such as scene layouts, object motion, and inter-object relations. Using StructSA as a main building block, we develop the structural vision transformer (StructViT) and evaluate its effectiveness on both image and video classification tasks, achieving state-of-the-art results on ImageNet-1K, Kinetics-400, Something-Something V1 & V2, Diving-48, and FineGym.

关键词： correlation modeling image classification self-attention video classification vision Transformers visual representation learning

来源：评论

学校读者我要写书评

暂无评论

RoDLA: Benchmarking the Robustness of Document Layout Analysis Models

RoDLA: Benchmarking the Robustness of Document Layout Analys...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Chen, Yufan Zhang, Jiaming Peng, Kunyu Zheng, Junwei Liu, Ruiping Torre, Philip Stiefelhagen, Rainer Karlsruhe Inst Technol Karlsruhe Germany Univ Oxford Oxford England

ISBN: (纸本)9798350353006

Before developing a Document Layout Analysis (DLA) model in real-world applications, conducting comprehensive robustness testing is essential. However, the robustness of DLA models remains underexplored in the literature. To address this, we are the first to introduce a robustness benchmark for DLA models, which includes 450K document images of three datasets. To cover realistic corruptions, we propose a perturbation taxonomy with 12 common document perturbations with 3 severity levels inspired by real-world document processing. Additionally, to better understand document perturbation impacts, we propose two metrics, Mean Perturbation Effect (mPE) for perturbation assessment and Mean Robustness Degradation (mRD) for robustness evaluation. Furthermore, we introduce a self-titled model, i.e., Robust Document Layout Analyzer (RoDLA), which improves attention mechanisms to boost extraction of robust features. Experiments on the proposed benchmarks (PubLayNet-P, DocLayNet-P, and M6Doc-P) demonstrate that RoDLA obtains state-of-the-art mRD scores of 115.7, 135.4, and 150.4, respectively. Compared to previous methods, RoDLA achieves notable improvements in mAP of +3.8%, +7.1% and +12.1%, respectively.

关键词： computer vision and pattern recognition Document Analysis Robustness

来源：评论

学校读者我要写书评

暂无评论

PEEKABOO: Interactive Video Generation via Masked-Diffusion

PEEKABOO: Interactive Video Generation via Masked-Diffusion

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Jain, Yash Nasery, Anshul Vineet, Vibhav Behl, Harkirat Microsoft Redmond WA 98052 USA Univ Washington Seattle WA USA

ISBN: (纸本)9798350353013;9798350353006

Modern video generation models like Sora have achieved remarkable success in producing high-quality videos. However, a significant limitation is their inability to offer interactive control to users, a feature that promises to open up unprecedented applications and creativity. In this work, we introduce the first solution to equip diffusion-based video generation models with spatio-temporal control. We present PEEKABOO, a novel masked attention module, which seamlessly integrates with current video generation models offering control without the need for additional training or inference overhead. To facilitate future research, we also introduce a comprehensive benchmark for interactive video generation. This benchmark offers a standardized framework for the community to assess the efficacy of emerging interactive video generation models. Our extensive qualitative and quantitative assessments reveal that PEEKABOO achieves up to a 3.8x improvement in mIoU over baseline models, all while maintaining the same latency. Code and benchmark are available on the webpage.

关键词： computer vision diffusion interactive text to video video generation

来源：评论

学校读者我要写书评

暂无评论

Large-Scale Bidirectional Training for Zero-Shot Image Captioning

Large-Scale Bidirectional Training for Zero-Shot Image Capti...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Kim, Taehoon Marsden, Mark Ahn, Pyunghwan Kim, Sangyun Lee, Sihaeng Sala, Alessandra Kim, Seung Hwan LG AI Res Seoul South Korea Shutterstock New York NY USA

ISBN: (纸本)9798350365474

When trained on large-scale datasets, image captioning models can understand the content of images from a general domain but often fail to generate accurate, detailed captions. To improve performance, pretraining-and-finetuning has been a key strategy for image captioning. However, we find that large-scale bidirectional training between image and text enables zero-shot image captioning. In this paper, we introduce Bidirectional Image Text Training in largER Scale, BITTERS, an efficient training and inference framework for zero-shot image captioning. We also propose a new evaluation benchmark which comprises of high quality datasets and an extensive set of metrics to properly evaluate zero-shot captioning accuracy and societal bias. We additionally provide an efficient finetuning approach for keyword extraction. We show that careful selection of large-scale training set and model architecture is the key to achieving zero-shot image captioning.

关键词： image captioning large-scale multimodal transformers vision-language zero-shot

来源：评论

学校读者我要写书评

暂无评论

HaLViT: Half of the Weights are Enough

HaLViT: Half of the Weights are Enough

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Koyun, Onur Can Toreyin, Behcet Ugur Istanbul Tech Univ Dept Artificial Intelligence & Data Engn Signal Proc Computat Intelligence Res Grp SP4CING Inst Informat Istanbul Turkiye

ISBN: (纸本)9798350365474

Deep learning architectures like Transformers and Convolutional Neural Networks (CNNs) have led to ground-breaking advances across numerous fields. However, their extensive need for parameters poses challenges for implementation in environments with limited resources. In our research, we propose a strategy that focuses on the utilization of the column and row spaces of weight matrices, significantly reducing the number of required model parameters without substantially affecting performance. This technique is applied to both Bottleneck and Attention layers, achieving a notable reduction in parameters with minimal impact on model efficacy. Our proposed model, HaLViT, exemplifies a parameter-efficient vision Transformer. Through rigorous experiments on the ImageNet dataset and COCO dataset, HaLViT's performance validates the effectiveness of our method, offering results comparable to those of conventional models.

关键词： CNN computer vision deep learning Efficient Parameter Transformer vision Transformer ViT

来源：评论

学校读者我要写书评

暂无评论

Token Transformation Matters: Towards Faithful Post-hoc Explanation for vision Transformer

Token Transformation Matters: Towards Faithful Post-hoc Expl...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Wu, Junyi Duan, Bin Kang, Weitai Tang, Hao Yan, Yan IIT Dept Comp Sci Chicago IL 60616 USA Carnegie Mellon Univ Robot Inst Pittsburgh PA 15213 USA

ISBN: (纸本)9798350353006

While Transformers have rapidly gained popularity in various computer vision applications, post-hoc explanations of their internal mechanisms remain largely unexplored. vision Transformers extract visual information by representing image regions as transformed tokens and integrating them via attention weights. However, existing post-hoc explanation methods merely consider these attention weights, neglecting crucial information from the transformed tokens, which fails to accurately illustrate the rationales behind the models' predictions. To incorporate the influence of token transformation into interpretation, we propose TokenTM, a novel post-hoc explanation method that utilizes our introduced measurement of token transformation effects. Specifically, we quantify token transformation effects by measuring changes in token lengths and correlations in their directions pre- and post-transformation. Moreover, we develop initialization and aggregation rules to integrate both attention weights and token transformation effects across all layers, capturing holistic token contributions throughout the model. Experimental results on segmentation and perturbation tests demonstrate the superiority of our proposed TokenTM compared to state-of-the-art vision Transformer explanation methods.

关键词： Explainability Transformer

来源：评论

学校读者我要写书评

暂无评论

Honeybee: Locality-enhanced Projector for Multimodal LLM

Honeybee: Locality-enhanced Projector for Multimodal LLM

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Cha, Junbum Kang, Wooyoung Mun, Jonghwan Roh, Byungseok Kakao Brain Seongnam South Korea

ISBN: (纸本)9798350353006

In Multimodal Large Language Models (MLLMs), a visual projector plays a crucial role in bridging pre-trained vision encoders with LLMs, enabling profound visual understanding while harnessing the LLMs' robust capabilities. Despite the importance of the visual projector, it has been relatively less explored. In this study, we first identify two essential projector properties: (i) flexibility in managing the number of visual tokens, crucial for MLLMs' overall efficiency, and (ii) preservation of local context from visual features, vital for spatial understanding. Based on these findings, we propose a novel projector design that is both flexible and locality-enhanced, effectively satisfying the two desirable properties. Additionally, we present comprehensive strategies to effectively utilize multiple and multifaceted instruction datasets. Through extensive experiments, we examine the impact of individual design choices. Finally, our proposed MLLM, Honeybee, remarkably outperforms previous state-of-the-art methods across various benchmarks, including MME, MMBench, SEED-Bench, and LLaVA-Bench, achieving significantly higher efficiency. Code and models are available at https://github. com/kakaobrain/honeybee.

关键词： Multimodal LLM vision-Language

来源：评论

学校读者我要写书评

暂无评论

Making vision Transformers Truly Shift-Equivariant

Making Vision Transformers Truly Shift-Equivariant

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Rojas-Gomez, Renan A. Lim, Teck-Yian Do, Minh N. Yeh, Raymond A. UIUC Dept Elect Engn Urbana IL 61801 USA UIUC VinUni Illinois Smart Hlth Ctr Urbana IL USA Purdue Univ Dept Comp Sci W Lafayette IN 47907 USA

ISBN: (纸本)9798350353013;9798350353006

In the field of computer vision, vision Transformers (ViTs) have emerged as a prominent deep learning architecture. Despite being inspired by Convolutional Neural Networks (CNNs), ViTs are susceptible to small spatial shifts in the input data - they lack shift-equivariance. To address this shortcoming, we introduce novel data-adaptive designs for each of the ViT modules that break shift-equivariance, such as tokenization, self-attention, patch merging, and positional encoding. With our proposed modules, we achieve perfect circular shift-equivariance across four prominent ViT architectures: Swin, SwinV2, CvT, and MViTv2. Additionally, we leverage our design to further enhance consistency under standard shifts. We evaluate our adaptive ViT models on image classification and semantic segmentation tasks. Our models achieve competitive performance across three diverse datasets, showcasing perfect (100%) circular shift consistency while improving standard shift consistency.(1)

关键词： shift equivariance shift invariance vision transformers

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 3 4 5 6 7 8 9 10 11 12 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：