检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

20,860 篇 会议
104 篇 期刊文献
43 册 图书

馆藏范围

21,006 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

13,619 篇 工学
- 11,055 篇 计算机科学与技术...
- 2,652 篇 机械工程
- 2,252 篇 软件工程
- 914 篇 光学工程
- 884 篇 电气工程
- 529 篇 控制科学与工程
- 477 篇 信息与通信工程
- 216 篇 测绘科学与技术
- 135 篇 生物工程
- 127 篇 生物医学工程（可授...
- 98 篇 电子科学与技术（可...
- 92 篇 仪器科学与技术
- 46 篇 安全科学与工程
- 40 篇 建筑学
- 40 篇 化学工程与技术
- 39 篇 土木工程
- 37 篇 交通运输工程
- 35 篇 力学（可授工学、理...
- 33 篇 航空宇航科学与技...
3,494 篇 医学
- 3,489 篇 临床医学
- 32 篇 基础医学(可授医学...
2,247 篇 理学
- 1,145 篇 物理学
- 1,081 篇 数学
- 401 篇 生物学
- 384 篇 统计学（可授理学、...
- 245 篇 系统科学
- 46 篇 化学
343 篇 管理学
- 176 篇 管理科学与工程(可...
- 168 篇 图书情报与档案管...
- 34 篇 工商管理
31 篇 法学
19 篇 农学
15 篇 教育学
8 篇 经济学
5 篇 艺术学
2 篇 军事学
1 篇 文学

主题

8,140 篇 computer vision
2,886 篇 training
2,840 篇 pattern recognit...
1,809 篇 computational mo...
1,715 篇 visualization
1,492 篇 cameras
1,433 篇 three-dimensiona...
1,433 篇 feature extracti...
1,366 篇 shape
1,360 篇 face recognition
1,243 篇 image segmentati...
1,135 篇 robustness
1,124 篇 semantics
992 篇 computer archite...
984 篇 object detection
982 篇 layout
959 篇 benchmark testin...
935 篇 codes
899 篇 computer science
898 篇 object recogniti...

机构

174 篇 univ sci & techn...
158 篇 univ chinese aca...
153 篇 carnegie mellon ...
145 篇 chinese univ hon...
109 篇 microsoft resear...
103 篇 zhejiang univ pe...
99 篇 swiss fed inst t...
95 篇 tsinghua univers...
90 篇 microsoft res as...
90 篇 tsinghua univ pe...
88 篇 shanghai ai lab ...
81 篇 zhejiang univers...
77 篇 alibaba grp peop...
74 篇 hong kong univ s...
73 篇 university of sc...
72 篇 peking univ peop...
72 篇 university of ch...
68 篇 shanghai jiao to...
66 篇 univ oxford oxfo...
65 篇 google res mount...

作者

80 篇 van gool luc
70 篇 zhang lei
58 篇 timofte radu
48 篇 yang yi
47 篇 luc van gool
46 篇 xiaoou tang
44 篇 tian qi
43 篇 darrell trevor
42 篇 loy chen change
42 篇 sun jian
41 篇 qi tian
40 篇 li stan z.
38 篇 li fei-fei
37 篇 chen xilin
36 篇 shan shiguang
35 篇 zhou jie
35 篇 vasconcelos nuno
35 篇 liu yang
35 篇 torralba antonio
34 篇 liu xiaoming

语言

20,981 篇 英文
10 篇 中文
7 篇 其他
5 篇 土耳其文
2 篇 日文
2 篇 葡萄牙文

检索条件"任意字段=2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016"

共 21007 条记录，以下是581-590 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Exploring the Effect of Primitives for Compositional Generalization in vision-and-Language

Exploring the Effect of Primitives for Compositional General...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Li, Chuanhao Li, Zhen Jing, Chenchen Jia, Yunde Wu, Yuwei Beijing Inst Technol Sch Comp Sci & Technol Beijing Key Lab Intelligent Informat Technol Beijing Peoples R China Shenzhen MSU BIT Univ Guangdong Lab Machine Percept & Intelligent Comp Shenzhen Peoples R China Zhejiang Univ Sch Comp Sci Hangzhou Peoples R China

ISBN: (纸本)9798350301298

Compositionality is one of the fundamental properties of human cognition (Fodor & Pylyshyn, 1988). Compositional generalization is critical to simulate the compositional capability of humans, and has received much attention in the vision-and-language (V&L) community. It is essential to understand the effect of the primitives, including words, image regions, and video frames, to improve the compositional generalization capability. In this paper, we explore the effect of primitives for compositional generalization in V&L. Specifically, we present a self-supervised learning based framework that equips existing V&L methods with two characteristics: semantic equivariance and semantic invariance. With the two characteristics, the methods understand primitives by perceiving the effect of primitive changes on sample semantics and ground-truth. Experimental results on two tasks: temporal video grounding and visual question answering, demonstrate the effectiveness of our framework.

关键词： language reasoning vision

来源：评论

学校读者我要写书评

暂无评论

Data-efficient Large Scale Place recognition with Graded Similarity Supervision

Data-efficient Large Scale Place Recognition with Graded Sim...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Leyva-Vallina, Maria Strisciuglio, Nicola Petkov, Nicolai Univ Groningen Groningen Netherlands Univ Twente Enschede Netherlands

ISBN: (纸本)9798350301298

Visual place recognition (VPR) is a fundamental task of computer vision for visual localization. Existing methods are trained using image pairs that either depict the same place or not. Such a binary indication does not consider continuous relations of similarity between images of the same place taken from different positions, determined by the continuous nature of camera pose. The binary similarity induces a noisy supervision signal into the training of VPR methods, which stall in local minima and require expensive hard mining algorithms to guarantee convergence. Motivated by the fact that two images of the same place only partially share visual cues due to camera pose differences, we deploy an automatic re-annotation strategy to re-label VPR datasets. We compute graded similarity labels for image pairs based on available localization metadata. Furthermore, we propose a new Generalized Contrastive Loss (GCL) that uses graded similarity labels for training contrastive networks. We demonstrate that the use of the new labels and GCL allow to dispense from hard-pair mining, and to train image descriptors that perform better in VPR by nearest neighbor search, obtaining superior or comparable results than methods that require expensive hard-pair mining and re-ranking techniques.

关键词： detection recognition: Categorization retrieval

来源：评论

学校读者我要写书评

暂无评论

A Dual-Mode Approach for vision-Based Navigation in a Lunar Landing Scenario

A Dual-Mode Approach for Vision-Based Navigation in a Lunar ...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Ostrogovich, Luca Del Prete, Roberto Tomasicchio, Giuseppe Longepe, Nicolas Renga, Alfredo Univ Naples Federico II Dept Ind Engn Naples Italy Telespazio SRL Rome Italy ESA ESRIN Lab Frascati Italy

ISBN: (纸本)9798350365474

In this research, a novel approach for autonomous spacecraft navigation, particularly in lunar contexts, is presented, focusing on vision-based techniques. The system incorporates lunar crater recognition in conjunction with feature tracking to enhance the accuracy of spacecraft navigation. This system underwent comprehensive evaluation through a purpose-built software simulation, replicating lunar conditions for thorough evaluation and refinement. The methodology integrates established navigational methods with advanced artificial intelligence algorithms, resulting in significant navigational accuracy. The system demonstrates precise capabilities in determining the spacecraft position, with an average accuracy of approximately 270 m for the absolute navigation mode, while the relative mode exhibited an average error of 27.4 m and 0.8 m in determining the horizontal and vertical lander displacements relative to terrain. Initial tests on embedded systems-akin to those on-board spacecraft-were conducted. These tests are pivotal in demonstrating the system's operational viability within the constraints of limited bandwidth and rapid processing requirements characteristic of space missions. The promising results from these tests suggest potential applicability in real-world space missions, enhancing autonomous navigation capabilities in lunar and potentially other extraterrestrial environments.

关键词： Lunar Navigation Onboard AI Visual Based Navigation

来源：评论

学校读者我要写书评

暂无评论

ViTs for SITS: vision Transformers for Satellite Image Time Series

ViTs for SITS: Vision Transformers for Satellite Image Time ...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Tarasiou, Michail Chavez, Erik Zafeiriou, Stefanos Imperial Coll London London England

ISBN: (纸本)9798350301298

In this paper we introduce the Temporo-Spatial vision Transformer (TSViT), a fully-attentional model for general Satellite Image Time Series (SITS) processing based on the vision Transformer (ViT). TSViT splits a SITS record into non-overlapping patches in space and time which are tokenized and subsequently processed by a factorized temporo-spatial encoder. We argue, that in contrast to natural images, a temporal-then-spatial factorization is more intuitive for SITS processing and present experimental evidence for this claim. Additionally, we enhance the model's discriminative power by introducing two novel mechanisms for acquisition-time-specific temporal positional encodings and multiple learnable class tokens. The effect of all novel design choices is evaluated through an extensive ablation study. Our proposed architecture achieves state-of-the-art performance, surpassing previous approaches by a significant margin in three publicly available SITS semantic segmentation and classification datasets. All model, training and evaluation codes can be found at https://***/michaeltrs/DeepSatModels.

关键词： Photogrammetry and remote sensing

来源：评论

学校读者我要写书评

暂无评论

Learning Attribute and Class-Specific Representation Duet for Fine-grained Fashion Analysis

Learning Attribute and Class-Specific Representation Duet fo...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Jiao, Yang Gao, Yan Meng, Jingjing Shang, Tin Sun, Yi Amazon Seattle WA 98109 USA

ISBN: (纸本)9798350301298

Fashion representation learning involves the analysis and understanding of various visual elements at different granularities and the interactions among them. Existing works often learn fine-grained fashion representations at the attribute level without considering their relationships and inter-dependencies across different classes. In this work, we propose to learn an attribute and class-specific fashion representation duet to better model such attribute relationships and inter-dependencies by leveraging prior knowledge about the taxonomy of fashion attributes and classes. Through two sub-networks for the attributes and classes, respectively, our proposed an embedding network progressively learns and refines the visual representation of a fashion image to improve its robustness for fashion retrieval. A multi-granularity loss consisting of attribute-level and class-level losses is proposed to introduce appropriate inductive bias to learn across different granularities of the fashion representations. Experimental results on three benchmark datasets demonstrate the effectiveness of our method, which outperforms the state-of-the-art methods by a large margin.

关键词： detection recognition: Categorization retrieval

来源：评论

学校读者我要写书评

暂无评论

Rugby Scene Classification Enhanced by vision Language Model

Rugby Scene Classification Enhanced by Vision Language Model

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Nonaka, Naoki Fujihira, Ryo Koshiba, Toshiki Maeda, Akira Seita, Jun RIKEN Informat R&D & Strategy Headquarters Adv Data Sci Project Wako Saitama Japan Hakata Knee & Sports Clin Fukuoka Japan

ISBN: (纸本)9798350365474

This study investigates the integration of vision language models (VLM) to enhance the classification of situations within rugby match broadcasts. The importance of accurately identifying situations in sports videos is emphasized for understanding game dynamics and facilitating downstream tasks like performance evaluation and injury prevention. Utilizing a dataset comprising 18, 000 labeled images extracted at 0.2-second intervals from 100 minutes of rugby match broadcasts, scene classification tasks including contact plays (scrums, mauls, rucks, tackles, lineouts), rucks, tackles, lineouts, and multiclass classification were performed. The study aims to validate the utility of VLM outputs in improving classification performance compared to using solely image data. Experimental results demonstrate substantial performance improvements across all tasks with the incorporation of VLM outputs. Our analysis of prompts suggests that, when provided with appropriate contextual information through natural language, VLMs can effectively capture the context of a given image. The findings of our study indicate that leveraging VLMs in the domain of sports analysis holds promise for developing image processing models capable of incorpolating the tacit knowledge encoded within language models, as well as information conveyed through natural language descriptions.

关键词： Rugby Scene classification vision language model

来源：评论

学校读者我要写书评

暂无评论

Multi Model Ensemble for Compound Expression recognition

Multi Model Ensemble for Compound Expression Recognition

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Yu, Jun Zhu, Jichao Zhu, Wangyuan Cai, Zhongpeng Zhao, Gongpeng Wei, Zhihong Xie, Guochen Zhang, Zerui Liu, Qingsong Liang, Jiaen Univ Sci & Technol China Hefei Anhui Peoples R China Unisound AI Technol Co Ltd Beijing Peoples R China

ISBN: (纸本)9798350365474

Compound Expression recognition (CER) plays a crucial role in interpersonal interactions. Due to the complexity of human emotional expressions, which leads to the existence of compound expressions, it is necessary to consider both local and global facial expressions comprehensively for recognition. In this paper, to address this issue, we propose a solution for compound expression recognition based on ensemble learning methods. Specifically, our task is classification. We trained three expression classification models based on convolutional networks (ResNet50), vision Transformers, and multi-scale local attention networks, respectively. Then, by using late fusion, integrated the outputs of three models to predict the final result, leveraging the strengths of different models. Our method achieves high accuracy on RAF-DB and in sixth Affective Behavior Analysis in-the-wild (ABAW) Challenge, achieves an F1 score of 0.224 on the test set of C-EXPR-DB.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Content-aware Token Sharing for Efficient Semantic Segmentation with vision Transformers

Content-aware Token Sharing for Efficient Semantic Segmentat...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Lu, Chenyang de Geus, Daan Dubbelman, Gijs Eindhoven Univ Technol Eindhoven Netherlands

ISBN: (纸本)9798350301298

This paper introduces Content-aware Token Sharing (CTS), a token reduction approach that improves the computational efficiency of semantic segmentation networks that use vision Transformers (ViTs). Existing works have proposed token reduction approaches to improve the efficiency of ViT-based image classification networks, but these methods are not directly applicable to semantic segmentation, which we address in this work. We observe that, for semantic segmentation, multiple image patches can share a token if they contain the same semantic class, as they contain redundant information. Our approach leverages this by employing an efficient, class-agnostic policy network that predicts if image patches contain the same semantic class, and lets them share a token if they do. With experiments, we explore the critical design choices of CTS and show its effectiveness on the ADE20K, Pascal Context and Cityscapes datasets, various ViT backbones, and different segmentation decoders. With Content-aware Token Sharing, we are able to reduce the number of processed tokens by up to 44%, without diminishing the segmentation quality.

关键词： Scene analysis and understanding

来源：评论

学校读者我要写书评

暂无评论

Towards Efficient Audio-Visual Learners via Empowering Pre-trained vision Transformers with Cross-Modal Adaptation

Towards Efficient Audio-Visual Learners via Empowering Pre-t...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Wang, Kai Tian, Yapeng Hatzinakos, Dimitrios Univ Toronto Toronto ON Canada Univ Texas Dallas Richardson TX 75083 USA

ISBN: (纸本)9798350365474

In this paper, we explore the cross-modal adaptation of pre-trained vision Transformers (ViTs) for the audio-visual domain by incorporating a limited set of trainable parameters. To this end, we propose a Spatial-Temporal-Global Cross-Modal Adaptation (STG-CMA) to gradually equip the frozen ViTs with the capability for learning audio-visual representation, consisting of the modality-specific temporal adaptation for temporal reasoning of each modality, the cross-modal spatial adaptation for refining the spatial information with the cue from counterpart modality, and the cross-modal global adaptation for global interaction between audio and visual modalities. Our STG-CMA presents a meaningful finding that only leveraging the shared pre-trained image model with inserted lightweight adapters is enough for spatial-temporal modeling and feature interaction of audio-visual modality. Extensive experiments indicate that our STG-CMA achieves state-of-the-art performance on various audio-visual understanding tasks including AVE, AVS, and AVQA while containing significantly reduced tunable parameters. The code is available at https://***/kaiw7/STG-CMA.

关键词： Audio-visual Lenarning Cross-modal Adaptation Pre-trained vision Transformers Reduced Tunanle Parameters Spatial-temporal-global Modeling

来源：评论

学校读者我要写书评

暂无评论

Rawgment: Noise-Accounted RAW Augmentation Enables recognition in a Wide Variety of Environments

Rawgment: Noise-Accounted RAW Augmentation Enables Recogniti...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Yoshimura, Masakazu Otsuka, Junji Irie, Atsushi Ohashi, Takeshi Sony Grp Corp Tokyo Japan

ISBN: (纸本)9798350301298

Image recognition models that work in challenging environments (e.g., extremely dark, blurry, or high dynamic range conditions) must be useful. However, creating training datasets for such environments is expensive and hard due to the difficulties of data collection and annotation. It is desirable if we could get a robust model without the need for hard-to-obtain datasets. One simple approach is to apply data augmentation such as color jitter and blur to standard RGB (sRGB) images in simple scenes. Unfortunately, this approach struggles to yield realistic images in terms of pixel intensity and noise distribution due to not considering the non-linearity of Image Signal Processors (ISPs) and noise characteristics of image sensors. Instead, we propose a noise-accounted RAW image augmentation method. In essence, color jitter and blur augmentation are applied to a RAW image before applying non-linear ISP, resulting in realistic intensity. Furthermore, we introduce a noise amount alignment method that calibrates the domain gap in the noise property caused by the augmentation. We show that our proposed noise-accounted RAW augmentation method doubles the image recognition accuracy in challenging environments only with simple training data.

关键词： Computational imaging

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 55 56 57 58 59 60 61 62 63 64 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：