检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

11,886 篇 会议
5 篇 期刊文献

馆藏范围

11,891 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

8,060 篇 工学
- 7,618 篇 计算机科学与技术...
- 796 篇 机械工程
- 688 篇 电气工程
- 361 篇 软件工程
- 228 篇 控制科学与工程
- 41 篇 光学工程
- 19 篇 生物工程
- 17 篇 信息与通信工程
- 12 篇 生物医学工程（可授...
- 7 篇 交通运输工程
- 6 篇 电子科学与技术（可...
- 6 篇 建筑学
- 5 篇 仪器科学与技术
- 5 篇 化学工程与技术
- 5 篇 安全科学与工程
- 4 篇 土木工程
3,347 篇 医学
- 3,346 篇 临床医学
- 4 篇 基础医学(可授医学...
- 4 篇 公共卫生与预防医...
254 篇 理学
- 198 篇 系统科学
- 32 篇 物理学
- 21 篇 生物学
- 19 篇 数学
- 9 篇 统计学（可授理学、...
- 7 篇 化学
17 篇 管理学
- 12 篇 管理科学与工程(可...
- 7 篇 图书情报与档案管...
- 5 篇 工商管理
3 篇 法学
- 3 篇 社会学
3 篇 教育学
- 3 篇 教育学
2 篇 农学
1 篇 经济学
1 篇 军事学

主题

5,633 篇 computer vision
2,668 篇 training
2,203 篇 pattern recognit...
1,747 篇 computational mo...
1,502 篇 visualization
1,360 篇 three-dimensiona...
1,074 篇 semantics
999 篇 benchmark testin...
986 篇 codes
959 篇 computer archite...
892 篇 deep learning
777 篇 conferences
754 篇 task analysis
700 篇 feature extracti...
561 篇 transformers
533 篇 face recognition
527 篇 neural networks
495 篇 object detection
490 篇 image segmentati...
468 篇 cameras

机构

174 篇 univ sci & techn...
145 篇 carnegie mellon ...
144 篇 univ chinese aca...
144 篇 tsinghua univ pe...
134 篇 chinese univ hon...
110 篇 zhejiang univ pe...
109 篇 peng cheng lab p...
99 篇 swiss fed inst t...
91 篇 tsinghua univers...
90 篇 shanghai ai lab ...
87 篇 sensetime res pe...
86 篇 shanghai jiao to...
83 篇 zhejiang univers...
82 篇 tech univ munich...
79 篇 university of sc...
79 篇 stanford univ st...
78 篇 univ hong kong p...
77 篇 australian natl ...
76 篇 alibaba grp peop...
75 篇 peng cheng labor...

作者

75 篇 timofte radu
64 篇 van gool luc
50 篇 zhang lei
43 篇 yang yi
37 篇 loy chen change
36 篇 tao dacheng
32 篇 zhou jie
31 篇 chen chen
30 篇 liu yang
30 篇 tian qi
29 篇 sun jian
29 篇 zha zheng-jun
28 篇 li xin
27 篇 qi tian
26 篇 vasconcelos nuno
25 篇 liu xiaoming
25 篇 darrell trevor
24 篇 zheng wei-shi
24 篇 luo ping
24 篇 ying shan

语言

11,849 篇 英文
41 篇 其他
1 篇 中文

检索条件"任意字段=2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024"

共 11891 条记录，以下是1411-1420 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Context-aware Alignment and Mutual Masking for 3D-Language Pre-training

Context-aware Alignment and Mutual Masking for 3D-Language P...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Jin, Zhao Hayat, Munawar Yang, Yuwei Guo, Yulan Lei, Yinjie Sichuan Univ Chengdu Peoples R China Monash Univ Melbourne Vic Australia Sun Yat Sen Univ Guangzhou Peoples R China

ISBN: (纸本)9798350301298

3D visual language reasoning plays an important role in effective human-computer interaction. The current approaches for 3D visual reasoning are task-specific, and lack pre-training methods to learn generic representations that can transfer across various tasks. Despite the encouraging progress in vision-language pre-training for image-text data, 3D-language pre-training is still an open issue due to limited 3D-language paired data, highly sparse and irregular structure of point clouds and ambiguities in spatial relations of 3D objects with viewpoint changes. In this paper, we present a generic 3D-language pre-training approach, that tackles multiple facets of 3D-language reasoning by learning universal representations. Our learning objective constitutes two main parts. 1) Context aware spatial-semantic alignment to establish fine-grained correspondence between point clouds and texts. It reduces relational ambiguities by aligning 3D spatial relationships with textual semantic context. 2) Mutual 3D-Language Masked modeling to enable cross-modality information exchange. Instead of reconstructing sparse 3D points for which language can hardly provide cues, we propose masked proposal reasoning to learn semantic class and mask-invariant representations. Our proposed 3D-language pre-training method achieves promising results once adapted to various downstream tasks, including 3D visual grounding, 3D dense captioning and 3D question answering. Our codes are available at https://***/leolyj/3D-VLP

关键词： language reasoning vision

来源：评论

学校读者我要写书评

暂无评论

Joint Depth Prediction and Semantic Segmentation with Multi-View SAM

Joint Depth Prediction and Semantic Segmentation with Multi-...

引用

ieee/cvf Winter conference on Applications of computer vision (WACV)

作者： Shvets, Mykhailo Zhao, Dongxu Niethammer, Marc Sengupta, Roni Berg, Alexander C. Univ North Carolina Chapel Hill NC 27515 USA Univ Calif Irvine Irvine CA USA

ISBN: (纸本)9798350318920;9798350318937

Multi-task approaches to joint depth and segmentation prediction are well-studied for monocular images. Yet, predictions from a single-view are inherently limited, while multiple views are available in many robotics applications. On the other end of the spectrum, video-based and full 3D methods require numerous frames to perform reconstruction and segmentation. With this work we propose a Multi-View Stereo (MVS) technique for depth prediction that benefits from rich semantic features of the Segment Anything Model (SAM). This enhanced depth prediction, in turn, serves as a prompt to our Transformer-based semantic segmentation decoder. We report the mutual benefit that both tasks enjoy in our quantitative and qualitative studies on the ScanNet dataset. Our approach consistently outperforms single-task MVS and segmentation models, along with multi-task monocular methods.

关键词： 3D computer vision Algorithms Algorithms Image recognition and understanding

来源：评论

学校读者我要写书评

暂无评论

CVT-SLR: Contrastive Visual-Textual Transformation for Sign Language recognition with Variational Alignment

CVT-SLR: Contrastive Visual-Textual Transformation for Sign ...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Zheng, Jiangbin Wang, Yile Tan, Cheng Li, Siyuan Wang, Ge Xia, Jun Chen, Yidong Li, Stan Z. Westlake Univ AI Lab Res Ctr Ind Future Hangzhou Peoples R China Tsinghua Univ Inst AI Ind Res AIR Beijing Peoples R China Xiamen Univ Sch Informat Xiamen Peoples R China

ISBN: (纸本)9798350301298

Sign language recognition (SLR) is a weakly supervised task that annotates sign videos as textual glosses. Recent studies show that insufficient training caused by the lack of large-scale available sign datasets becomes the main bottleneck for SLR. Most SLR works thereby adopt pretrained visual modules and develop two mainstream solutions. The multi-stream architectures extend multi-cue visual features, yielding the current SOTA performances but requiring complex designs and might introduce potential noise. Alternatively, the advanced single-cue SLR frameworks using explicit cross-modal alignment between visual and textual modalities are simple and effective, potentially competitive with the multi-cue framework. In this work, we propose a novel contrastive visual-textual transformation for SLR, CVT-SLR, to fully explore the pretrained knowledge of both the visual and language modalities. Based on the single-cue cross-modal alignment framework, we propose a variational autoencoder (VAE) for pretrained contextual knowledge while introducing the complete pretrained language module. The VAE implicitly aligns visual and textual modalities while benefiting from pretrained contextual knowledge as the traditional contextual module. Meanwhile, a contrastive cross-modal alignment algorithm is designed to explicitly enhance the consistency constraints. Extensive experiments on public datasets (PHOENIX-2014 and PHOENIX-2014T) demonstrate that our proposed CVT-SLR consistently outperforms existing single-cue methods and even outperforms SOTA multi-cue methods. The source codes and models are available at https://***/binbinjiang/CVT-SLR.

关键词： language reasoning vision

来源：评论

学校读者我要写书评

暂无评论

PGVT: Pose-Guided Video Transformer for Fine-Grained Action recognition

PGVT: Pose-Guided Video Transformer for Fine-Grained Action ...

引用

ieee/cvf Winter conference on Applications of computer vision (WACV)

作者： Zhang, Haosong Leong, Mei Chee Li, Liyuan Lin, Weisi ASTAR Inst Infocomm Res I2R Singapore Singapore Nanyang Technol Univ Singapore Singapore

ISBN: (纸本)9798350318920;9798350318937

Based on recent advancements in transformer-based video models and multi-modal joint learning, we propose a novel model, named Pose-Guided Video Transformer (PGVT), to incorporate sparse high-level body joints locations and dense low-level visual pixels for effective learning and accurate recognition of human actions. PGVT leverages the pre-trained image models by freezing their parameters and introducing trainable adapters to effectively integrate two input modalities, i.e., human poses and video frames, to learn a pose-focused spatiotemporal representation of human actions. We design two novel core modules, i.e., Pose Temporal Attention and Pose-Video Spatial Attention, to facilitate interaction between body joint locations and uniform video tokens, enriching each modality with contextualized information from the other. We evaluate PGVT model on four action recognition datasets: Diving48, Gym99, and Gym288 for fine-grained action recognition, and Kinetics400 for coarse-grained action recognition. Our model achieves new SOTA performance on the three fine-grained human action recognition datasets and comparable performance on Kinetics400 with a small number of tunable parameters compared with SOTA methods. Various ablation studies are performed which verify the benefits of our new designs.

关键词： Algorithms Algorithms Algorithms and algorithms formulations Machine learning architectures Video recognition and understanding vision + language and/or other modalities

来源：评论

学校读者我要写书评

暂无评论

Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information

Towards All-in-one Pre-training via Maximizing Multi-modal M...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Su, Weijie Zhu, Xizhou Tao, Chenxin Lu, Lewei Li, Bin Huang, Gao Qiao, Yu Wang, Xiaogang Zhou, Jie Dai, Jifeng Univ Sci & Technol China Beijing Peoples R China SenseTime Res Hong Kong Peoples R China Tsinghua Univ Beijing Peoples R China Shanghai Artificial Intelligence Lab Shanghai Peoples R China Chinese Univ Hong Kong Hong Kong Peoples R China

ISBN: (纸本)9798350301298

To effectively exploit the potential of large-scale models, various pre-training strategies supported by massive data from different sources are proposed, including supervised pre-training, weakly-supervised pre-training, and self-supervised pre-training. It has been proved that combining multiple pre-training strategies and data from various modalities/sources can greatly boost the training of large-scale models. However, current works adopt a multi-stage pre-training system, where the complex pipeline may increase the uncertainty and instability of the pre-training. It is thus desirable that these strategies can be integrated in a single-stage manner. In this paper, we first propose a general multi-modal mutual information formula as a unified optimization target and demonstrate that all mainstream approaches are special cases of our framework. Under this unified perspective, we propose an all-in-one single-stage pre-training approach, named Maximizing Multi-modal Mutual Information Pre-training (M3I Pretraining). Our approach achieves better performance than previous pre-training methods on various vision benchmarks, including ImageNet classification, COCO object detection, LVIS long-tailed object detection, and ADE20k semantic segmentation. Notably, we successfully pre-train a billion-level parameter image backbone and achieve state-of-the-art performance on various benchmarks under public data setting. Code shall be released at https://***/OpenGVLab/M3I-Pretraining.

关键词： detection recognition: Categorization retrieval

来源：评论

学校读者我要写书评

暂无评论

Semantic-aware Video Representation for Few-shot Action recognition

Semantic-aware Video Representation for Few-shot Action Reco...

引用

ieee/cvf Winter conference on Applications of computer vision (WACV)

作者： Tang, Yutao Bejar, Benjamin Vidal, Rene Johns Hopkins Univ Baltimore MD 21218 USA Paul Scherrer Inst Wurenlingen Switzerland Univ Penn Philadelphia PA USA

ISBN: (纸本)9798350318920;9798350318937

Recent work on action recognition leverages 3D features and textual information to achieve state-of-the-art performance. However, most of the current few-shot action recognition methods still rely on 2D frame-level representations, often require additional components to model temporal relations, and employ complex distance functions to achieve accurate alignment of these representations. In addition, existing methods struggle to effectively integrate textual semantics, some resorting to concatenation or addition of textual and visual features, and some using text merely as an additional supervision without truly achieving feature fusion and information transfer from different modalities. In this work, we propose a simple yet effective Semantic-Aware Few-Shot Action recognition (SAFSAR) model to address these issues. We show that directly leveraging a 3D feature extractor combined with an effective feature-fusion scheme, and a simple cosine similarity for classification can yield better performance without the need of extra components for temporal modeling or complex distance functions. We introduce an innovative scheme to encode the textual semantics into the video representation which adaptively fuses features from text and video, and encourages the visual encoder to extract more semantically consistent features. In this scheme, SAFSAR achieves alignment and fusion in a compact way. Experiments on five challenging few-shot action recognition benchmarks under various settings demonstrate that the proposed SAFSAR model significantly improves the state-of-the-art performance.

关键词： Algorithms Algorithms Video recognition and understanding vision + language and/or other modalities

来源：评论

学校读者我要写书评

暂无评论

Towards Trustable Skin Cancer Diagnosis via Rewriting Model's Decision

Towards Trustable Skin Cancer Diagnosis via Rewriting Model'...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Yan, Siyuan Yu, Zhen Zhang, Xuelin Mahapatra, Dwarikanath Chandra, Shekhar S. Janda, Monaca Soyer, Peter Ge, Zongyuan Monash Univ Clayton Vic Australia Monash Med AI Grp Melbourne Vic Australia Univ Queensland Brisbane Qld Australia Incept Inst AI Abu Dhabi U Arab Emirates

ISBN: (纸本)9798350301298

Deep neural networks have demonstrated promising performance on image recognition tasks. However, they may heavily rely on confounding factors, using irrelevant artifacts or bias within the dataset as the cue to improve performance. When a model performs decision-making based on these spurious correlations, it can become untrustable and lead to catastrophic outcomes when deployed in the real-world scene. In this paper, we explore and try to solve this problem in the context of skin cancer diagnosis. We introduce a human-in-the-loop framework in the model training process such that users can observe and correct the model's decision logic when confounding behaviors happen. Specifically, our method can automatically discover confounding factors by analyzing the co-occurrence behavior of the samples. It is capable of learning confounding concepts using easily obtained concept exemplars. By mapping the black-box model's feature representation onto an explainable concept space, human users can interpret the concept and intervene via first order-logic instruction. We systematically evaluate our method on our newly crafted, well-controlled skin lesion dataset and several public skin lesion datasets. Experiments show that our method can effectively detect and remove confounding factors from datasets without any prior knowledge about the category distribution and does not require fully annotated concept labels. We also show that our method enables the model to focus on clinical-related concepts, improving the model's performance and trustworthiness during model inference.

关键词： cell microscopy Medical and biological vision

来源：评论

学校读者我要写书评

暂无评论

Siamese Image Modeling for Self-Supervised vision Representation Learning

Siamese Image Modeling for Self-Supervised Vision Representa...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Tao, Chenxin Zhu, Xizhou Su, Weijie Huang, Gao Li, Bin Zhou, Jie Qiao, Yu Wang, Xiaogang Dai, Jifeng Tsinghua Univ Beijing Peoples R China SenseTime Res Hong Kong Peoples R China Univ Sci & Technol China Hefei Peoples R China Shanghai Artificial Intelligence Lab Shanghai Peoples R China Chinese Univ Hong Kong Hong Kong Peoples R China

ISBN: (纸本)9798350301298

Self-supervised learning (SSL) has delivered superior performance on a variety of downstream vision tasks. Two main-stream SSL frameworks have been proposed, i.e., Instance Discrimination (ID) and Masked Image Modeling (MIM). ID pulls together representations from different views of the same image, while avoiding feature collapse. It lacks spatial sensitivity, which requires modeling the local structure within each image. On the other hand, MIM reconstructs the original content given a masked image. It instead does not have good semantic alignment, which requires projecting semantically similar views into nearby representations. To address this dilemma, we observe that (1) semantic alignment can be achieved by matching different image views with strong augmentations;(2) spatial sensitivity can benefit from predicting dense representations with masked images. Driven by these analysis, we propose Siamese Image Modeling (SiameseIM), which predicts the dense representations of an augmented view, based on another masked view from the same image but with different augmentations. SiameseIM uses a Siamese network with two branches. The online branch encodes the first view, and predicts the second view's representation according to the relative positions between these two views. The target branch produces the target by encoding the second view. SiameseIM can surpass both ID and MIM on a wide range of downstream tasks, including ImageNet finetuning and linear probing, COCO and LVIS detection, and ADE20k semantic segmentation. The improvement is more significant in few-shot, long-tail and robustness-concerned scenarios. Code shall be released.

关键词： detection recognition: Categorization retrieval

来源：评论

学校读者我要写书评

暂无评论

ProcSim: Proxy-based Confidence for Robust Similarity Learning

ProcSim: Proxy-based Confidence for Robust Similarity Learni...

引用

ieee/cvf Winter conference on Applications of computer vision (WACV)

作者： Barbany, Oriol Lin, Xiaofan Bastan, Muhammet Dhua, Arnab CSIC UPC Inst Robot & Informat Ind Barcelona Spain Amazon Visual Search AR Seattle WA USA Amazon Seattle WA USA

ISBN: (纸本)9798350318920;9798350318937

Deep Metric Learning (DML) methods aim at learning an embedding space in which distances are closely related to the inherent semantic similarity of the inputs. Previous studies have shown that popular benchmark datasets often contain numerous wrong labels, and DML methods are susceptible to them. Intending to study the effect of realistic noise, we create an ontology of the classes in a dataset and use it to simulate semantically coherent labeling mistakes. To train robust DML models, we propose ProcSim, a simple framework that assigns a confidence score to each sample using the normalized distance to its class representative. The experimental results show that the proposed method achieves state-of-the-art performance on the DML benchmark datasets injected with uniform and the proposed semantically coherent noise.

关键词： Algorithms Algorithms Image recognition and understanding vision + language and/or other modalities

来源：评论

学校读者我要写书评

暂无评论

Hint-Aug: Drawing Hints from Foundation vision Transformers towards Boosted Few-shot Parameter-Efficient Tuning

Hint-Aug: Drawing Hints from Foundation Vision Transformers ...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Yu, Zhongzhi Wu, Shang Fu, Yonggan Zhang, Shunyao Lin, Yingyan (Celine) Georgia Inst Technol Atlanta GA 30332 USA Rice Univ Houston TX USA

ISBN: (纸本)9798350301298

Despite the growing demand for tuning foundation vision transformers (FViTs) on downstream tasks, fully unleashing FViTs' potential under data-limited scenarios (e.g., few-shot tuning) remains a challenge due to FViTs' data-hungry nature. Common data augmentation techniques fall short in this context due to the limited features contained in the few-shot tuning data. To tackle this challenge, we first identify an opportunity for FViTs in few-shot tuning: pretrained FViTs themselves have already learned highly representative features from large-scale pretraining data, which are fully preserved during widely used parameter-efficient tuning. We thus hypothesize that leveraging those learned features to augment the tuning data can boost the effectiveness of few-shot FViT tuning. To this end, we propose a framework called Hint-based Data Augmentation (Hint-Aug), which aims to boost FViT in few-shot tuning by augmenting the over-fitted parts of tuning samples with the learned features of pretrained FViTs. Specifically, Hint-Aug integrates two key enablers: (1) an Attentive Over-fitting Detector (AOD) to detect over-confident patches of foundation ViTs for potentially alleviating their over-fitting on the few-shot tuning data and (2) a Confusion-based Feature Infusion (CFI) module to infuse easy-to-confuse features from the pretrained FViTs with the over-confident patches detected by the above AOD in order to enhance the feature diversity during tuning. Extensive experiments and ablation studies on five datasets and three parameter-efficient tuning techniques consistently validate Hint-Aug's effectiveness: 0.04% similar to 32.91% higher accuracy over the state-of-the-art (SOTA) data augmentation method under various low-shot settings. For example, on the Pet dataset, Hint-Aug achieves a 2.22% higher accuracy with 50% less training data over SOTA data augmentation methods.

关键词： continual low-shot meta or long-tail learning Transfer

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 138 139 140 141 142 143 144 145 146 147 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：