检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

50,479 篇 会议
1,421 册 图书
1,041 篇 期刊文献
1 篇 学位论文

馆藏范围

52,940 篇 电子文献
4 种 纸本馆藏

日期分布

学科分类号

31,811 篇 工学
- 24,804 篇 计算机科学与技术...
- 12,568 篇 软件工程
- 5,153 篇 光学工程
- 4,756 篇 电气工程
- 4,436 篇 信息与通信工程
- 4,257 篇 机械工程
- 3,956 篇 控制科学与工程
- 2,474 篇 生物工程
- 1,728 篇 生物医学工程（可授...
- 1,584 篇 仪器科学与技术
- 1,317 篇 电子科学与技术（可...
- 793 篇 化学工程与技术
- 698 篇 安全科学与工程
- 542 篇 交通运输工程
- 379 篇 建筑学
- 331 篇 土木工程
11,839 篇 理学
- 6,434 篇 物理学
- 5,405 篇 数学
- 2,761 篇 生物学
- 1,910 篇 统计学（可授理学、...
- 801 篇 化学
- 669 篇 系统科学
5,305 篇 医学
- 5,094 篇 临床医学
- 729 篇 基础医学(可授医学...
- 459 篇 药学(可授医学、理...
3,350 篇 管理学
- 1,953 篇 图书情报与档案管...
- 1,535 篇 管理科学与工程(可...
- 479 篇 工商管理
720 篇 艺术学
- 718 篇 设计学（可授艺术学...
428 篇 法学
- 401 篇 社会学
297 篇 农学
197 篇 教育学
163 篇 经济学
63 篇 文学
49 篇 军事学

主题

17,385 篇 computer vision
9,017 篇 pattern recognit...
4,196 篇 training
3,815 篇 feature extracti...
3,134 篇 cameras
2,870 篇 computational mo...
2,789 篇 image segmentati...
2,622 篇 visualization
2,573 篇 shape
2,533 篇 face recognition
2,171 篇 robustness
2,123 篇 computer science
1,973 篇 object detection
1,959 篇 computer archite...
1,878 篇 layout
1,853 篇 object recogniti...
1,802 篇 three-dimensiona...
1,725 篇 neural networks
1,708 篇 humans
1,691 篇 image recognitio...

机构

165 篇 univ chinese aca...
144 篇 tsinghua univers...
136 篇 national laborat...
108 篇 univ sci & techn...
104 篇 zhejiang univers...
100 篇 shanghai jiao to...
95 篇 microsoft resear...
94 篇 university of sc...
86 篇 zhejiang univ pe...
84 篇 shanghai ai lab ...
74 篇 school of comput...
69 篇 computer vision ...
68 篇 peking univ peop...
68 篇 chinese acad sci...
65 篇 chinese univ hon...
63 篇 institute of inf...
62 篇 google res mount...
61 篇 univ oxford oxfo...
59 篇 univ toronto on
57 篇 swiss fed inst t...

作者

91 篇 van gool luc
87 篇 umapada pal
76 篇 zhang lei
64 篇 lee seong-whan
49 篇 vittorio murino
42 篇 yang yi
34 篇 nassir navab
33 篇 li xin
33 篇 jie yang
32 篇 liu yang
31 篇 escalera sergio
31 篇 loy chen change
30 篇 ling haibin
30 篇 h. bischof
29 篇 zhou jie
29 篇 vasconcelos nuno
29 篇 jan-michael frah...
29 篇 hanqing lu
28 篇 blumenstein mich...
27 篇 jia yunde

语言

51,871 篇 英文
835 篇 其他
241 篇 中文
22 篇 土耳其文
5 篇 西班牙文
2 篇 日文
2 篇 葡萄牙文
2 篇 俄文

检索条件"任意字段=IEEE Conference on Computer Vision and Pattern Recognition"

共 52943 条记录，以下是101-110 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Learning Correlation Structures for vision Transformers

Learning Correlation Structures for Vision Transformers

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Kim, Manjin Seo, Paul Hongsuck Schmid, Cordelia Cho, Minsu POSTECH Pohang South Korea Korea Univ Seoul South Korea Google Res Mountain View CA USA

ISBN: (纸本)9798350353006

We introduce a new attention mechanism, dubbed structural self-attention (StructSA), that leverages rich correlation patterns naturally emerging in key-query interactions of attention. StructSA generates attention maps by recognizing space-time structures of key-query correlations via convolution and uses them to dynamically aggregate local contexts of value features. This effectively leverages rich structural patterns in images and videos such as scene layouts, object motion, and inter-object relations. Using StructSA as a main building block, we develop the structural vision transformer (StructViT) and evaluate its effectiveness on both image and video classification tasks, achieving state-of-the-art results on ImageNet-1K, Kinetics-400, Something-Something V1 & V2, Diving-48, and FineGym.

关键词： correlation modeling image classification self-attention video classification vision Transformers visual representation learning

来源：评论

学校读者我要写书评

暂无评论

VLP: vision Language Planning for Autonomous Driving

VLP: Vision Language Planning for Autonomous Driving

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Pan, Chenbin Yaman, Burhaneddin Nesti, Tommaso Mallik, Abhirup Allievi, Alessandro G. Velipasalar, Senem Rene, Liu Syracuse Univ Syracuse NY USA Bosch Res North Amer & Bosch Ctr Artificial Intel Sunnyvale CA 94085 USA

ISBN: (纸本)9798350353006

Autonomous driving is a complex and challenging task that aims at safe motion planning through scene understanding and reasoning. While vision-only autonomous driving methods have recently achieved notable performance, through enhanced scene understanding, several key issues, including lack of reasoning, low generalization performance and long-tail scenarios, still need to be addressed. In this paper, we present VLP, a novel vision-Language-Planning framework that exploits language models to bridge the gap between linguistic understanding and autonomous driving. VLP enhances autonomous driving systems by strengthening both the source memory foundation and the self-driving car's contextual understanding. VLP achieves state-of-the-art end-to-end planning performance on the challenging NuScenes dataset by achieving 35.9% and 60.5% reduction in terms of average L2 error and collision rates, respectively, compared to the previous best method. Moreover, VLP shows improved performance in challenging long-tail scenarios and strong generalization capabilities when faced with new urban environments.

关键词： Visual languages

来源：评论

学校读者我要写书评

暂无评论

Making vision Transformers Truly Shift-Equivariant

Making Vision Transformers Truly Shift-Equivariant

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Rojas-Gomez, Renan A. Lim, Teck-Yian Do, Minh N. Yeh, Raymond A. UIUC Dept Elect Engn Urbana IL 61801 USA UIUC VinUni Illinois Smart Hlth Ctr Urbana IL USA Purdue Univ Dept Comp Sci W Lafayette IN 47907 USA

ISBN: (纸本)9798350353013;9798350353006

In the field of computer vision, vision Transformers (ViTs) have emerged as a prominent deep learning architecture. Despite being inspired by Convolutional Neural Networks (CNNs), ViTs are susceptible to small spatial shifts in the input data - they lack shift-equivariance. To address this shortcoming, we introduce novel data-adaptive designs for each of the ViT modules that break shift-equivariance, such as tokenization, self-attention, patch merging, and positional encoding. With our proposed modules, we achieve perfect circular shift-equivariance across four prominent ViT architectures: Swin, SwinV2, CvT, and MViTv2. Additionally, we leverage our design to further enhance consistency under standard shifts. We evaluate our adaptive ViT models on image classification and semantic segmentation tasks. Our models achieve competitive performance across three diverse datasets, showcasing perfect (100%) circular shift consistency while improving standard shift consistency.(1)

关键词： shift equivariance shift invariance vision transformers

来源：评论

学校读者我要写书评

暂无评论

RoDLA: Benchmarking the Robustness of Document Layout Analysis Models

RoDLA: Benchmarking the Robustness of Document Layout Analys...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Chen, Yufan Zhang, Jiaming Peng, Kunyu Zheng, Junwei Liu, Ruiping Torre, Philip Stiefelhagen, Rainer Karlsruhe Inst Technol Karlsruhe Germany Univ Oxford Oxford England

ISBN: (纸本)9798350353006

Before developing a Document Layout Analysis (DLA) model in real-world applications, conducting comprehensive robustness testing is essential. However, the robustness of DLA models remains underexplored in the literature. To address this, we are the first to introduce a robustness benchmark for DLA models, which includes 450K document images of three datasets. To cover realistic corruptions, we propose a perturbation taxonomy with 12 common document perturbations with 3 severity levels inspired by real-world document processing. Additionally, to better understand document perturbation impacts, we propose two metrics, Mean Perturbation Effect (mPE) for perturbation assessment and Mean Robustness Degradation (mRD) for robustness evaluation. Furthermore, we introduce a self-titled model, i.e., Robust Document Layout Analyzer (RoDLA), which improves attention mechanisms to boost extraction of robust features. Experiments on the proposed benchmarks (PubLayNet-P, DocLayNet-P, and M6Doc-P) demonstrate that RoDLA obtains state-of-the-art mRD scores of 115.7, 135.4, and 150.4, respectively. Compared to previous methods, RoDLA achieves notable improvements in mAP of +3.8%, +7.1% and +12.1%, respectively.

关键词： computer vision and pattern recognition Document Analysis Robustness

来源：评论

学校读者我要写书评

暂无评论

Large-Scale Bidirectional Training for Zero-Shot Image Captioning

Large-Scale Bidirectional Training for Zero-Shot Image Capti...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Kim, Taehoon Marsden, Mark Ahn, Pyunghwan Kim, Sangyun Lee, Sihaeng Sala, Alessandra Kim, Seung Hwan LG AI Res Seoul South Korea Shutterstock New York NY USA

ISBN: (纸本)9798350365474

When trained on large-scale datasets, image captioning models can understand the content of images from a general domain but often fail to generate accurate, detailed captions. To improve performance, pretraining-and-finetuning has been a key strategy for image captioning. However, we find that large-scale bidirectional training between image and text enables zero-shot image captioning. In this paper, we introduce Bidirectional Image Text Training in largER Scale, BITTERS, an efficient training and inference framework for zero-shot image captioning. We also propose a new evaluation benchmark which comprises of high quality datasets and an extensive set of metrics to properly evaluate zero-shot captioning accuracy and societal bias. We additionally provide an efficient finetuning approach for keyword extraction. We show that careful selection of large-scale training set and model architecture is the key to achieving zero-shot image captioning.

关键词： image captioning large-scale multimodal transformers vision-language zero-shot

来源：评论

学校读者我要写书评

暂无评论

GRAFIQS: Face Image Quality Assessment Using Gradient Magnitudes

GRAFIQS: Face Image Quality Assessment Using Gradient Magnit...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Kolf, Jan Niklas Damer, Naser Boutros, Fadi Fraunhofer Inst Comp Graph Res IGD Darmstadt Germany Tech Univ Darmstadt Darmstadt Germany

ISBN: (纸本)9798350365474

Face Image Quality Assessment (FIQA) estimates the utility of face images for automated face recognition (FR) systems. We propose in this work a novel approach to assess the quality of face images based on inspecting the required changes in the pre-trained FR model weights to minimize differences between testing samples and the distribution of the FR training dataset. To achieve that, we propose quantifying the discrepancy in Batch Normalization statistics (BNS), including mean and variance, between those recorded during FR training and those obtained by processing testing samples through the pretrained FR model. We then generate gradient magnitudes of pretrained FR weights by backpropagating the BNS through the pretrained model. The cumulative absolute sum of these gradient magnitudes serves as the FIQ for our approach. Through comprehensive experimentation, we demonstrate the effectiveness of our training-free and quality labeling-free approach, achieving competitive performance to recent state-of-the-art FIQA approaches without relying on quality labeling, the need to train regression networks, specialized architectures, or designing and optimizing specific loss functions.

关键词： Biometrics computer vision Face Image Quality Assessment Face recognition

来源：评论

学校读者我要写书评

暂无评论

Classes Are Not Equal: An Empirical Study on Image recognition Fairness

Classes Are Not Equal: An Empirical Study on Image Recogniti...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Cui, Jiequan Zhu, Beier Wen, Xin Qi, Xiaojuan Yu, Bei Zhang, Hanwang Nanyang Technol Univ Singapore Singapore Univ Hong Kong Hong Kong Peoples R China Chinese Univ Hong Kong Hong Kong Peoples R China

ISBN: (纸本)9798350353006

In this paper, we present an empirical study on image unfairness, i.e., extreme class accuracy disparity on balanced data like ImageNet. We demonstrate that are not equal and unfairness is prevalent for image classification models across various datasets, network and model capacities. Moreover, several intriguing properties of fairness are identified. First, the unfairness lies in problematic representation rather than classifier bias distinguished from long-tailed recognition. Second, with the proposed concept of Model Prediction Bias, investigate the origins of problematic representation training optimization. Our findings reveal that models tend to exhibit greater prediction biases for classes that more challenging to recognize. It means that more other will be confused with harder classes. Then the False (FPs) will dominate the learning in optimization, leading to their poor accuracy. Further, we conclude data augmentation and representation learning algorithms improve overall performance by promoting fairness some degree in image classification.

关键词： Fairness Long-tailed recognition Representation Learning vision-Language Models

来源：评论

学校读者我要写书评

暂无评论

vision-and-Language Navigation via Causal Learning

Vision-and-Language Navigation via Causal Learning

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Wang, Liuyi He, Zongtao Dang, Ronghao Shen, Mengjiao Liu, Chengju Chen, Qijun Tongji Univ Sch Elect & Informat Engn Shanghai Peoples R China

ISBN: (纸本)9798350353006

In the pursuit of robust and generalizable environment perception and language understanding, the ubiquitous challenge of dataset bias continues to plague vision-and-language navigation (VLN) agents, hindering their perfor-mance in unseen environments. This paper introduces the generalized cross-modal causal transformer (GOAT), a pioneering solution rooted in the paradigm of causal inference. By delving into both observable and unobservable confounders within vision, language, and history, we propose the back-door and front-door adjustment causal learning (BACL and FACL) modules to promote unbiased learning by comprehensively mitigating potential spurious correlations. Additionally, to capture global confounder features, we propose a cross-modal feature pooling (CFP) module supervised by contrastive learning, which is also shown to be effective in improving cross-modal representations during pre-training. Extensive experiments across multiple VLN datasets (R2R, REVERIE, RxR, and SOON) under-score the superiority of our proposed method over previous state-of-the-art approaches. Code is available at https://***/CrystalSixone/VLN-GOAT.

关键词： causal learning cross-modal fusion embodied AI vision-and-language vision-and-language navigation

来源：评论

学校读者我要写书评

暂无评论

Generating Diverse Agricultural Data for vision-Based Farming Applications

Generating Diverse Agricultural Data for Vision-Based Farmin...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Cieslak, Mikolaj Govindarajan, Umabharathi Garcia, Alejandro Chandrashekar, Anuradha Haedrich, Torsten Mendoza-Drosik, Aleksander Michels, Dominik L. Pirk, Soeren Fu, Chia-Chun Palubicki, Wojciech GreenMatterAI Berlin Germany Blue River Technol Santa Clara CA USA King Abdullah Univ Sci & Technol Thuwal Saudi Arabia Tech Univ Darmstadt Darmstadt Germany Christian Albrecht Univ Kiel Kiel Germany Adam Mickiewicz Univ Poznan Poland

ISBN: (纸本)9798350365474

We present a specialized procedural model for generating synthetic agricultural scenes, focusing on soybean crops, along with various weeds. The model simulates distinct growth stages of these plants, diverse soil conditions, and randomized field arrangements under varying lighting conditions. The integration of real-world textures and environmental factors into the procedural generation process enhances the photorealism and applicability of the synthetic data. We validate our model's effectiveness by comparing the synthetic data against real agricultural images, demonstrating its potential to significantly augment training data for machine learning models in agriculture. This approach not only provides a cost-effective solution for generating high-quality, diverse data but also addresses specific needs in agricultural vision tasks that are not fully covered by general-purpose models.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Hybrid Functional Maps for Crease-Aware Non-Isometric Shape Matching

Hybrid Functional Maps for Crease-Aware Non-Isometric Shape ...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Bastian, Lennart Xie, Yizheng Navab, Nassir Laehner, Zorah Tech Univ Munich Munich Germany Univ Siegen Siegen Germany Univ Bonn Bonn Germany Lamarr Inst Bonn Germany

ISBN: (纸本)9798350353013;9798350353006

Non-isometric shape correspondence remains a fundamental challenge in computer vision. Traditional methods using Laplace-Beltrami operator (LBO) eigenmodes face limitations in characterizing high-frequency extrinsic shape changes like bending and creases. We propose a novel approach of combining the non-orthogonal extrinsic basis of eigenfunctions of the elastic thin-shell hessian with the intrinsic ones of the LBO, creating a hybrid spectral space in which we construct functional maps. To this end, we present a theoretical framework to effectively integrate non-orthogonal basis functions into descriptor- and learning-based functional map methods. Our approach can be incorporated easily into existing functional map pipelines across varying applications and can handle complex deformations beyond isometries. We show extensive evaluations across various supervised and unsupervised settings and demonstrate significant improvements. Notably, our approach achieves up to 15% better mean geodesic error for non-isometric correspondence settings and up to 45% improvement in scenarios with topological noise. Code is available at: https://***/

关键词： computer vision Functional Maps Non-isometric Shape Correspondence Shape Matching Topological Noise

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 7 8 9 10 11 12 13 14 15 16 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：