检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

11,885 篇 会议
5 篇 期刊文献

馆藏范围

11,890 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

8,059 篇 工学
- 7,617 篇 计算机科学与技术...
- 796 篇 机械工程
- 688 篇 电气工程
- 360 篇 软件工程
- 228 篇 控制科学与工程
- 40 篇 光学工程
- 19 篇 生物工程
- 17 篇 信息与通信工程
- 12 篇 生物医学工程（可授...
- 6 篇 电子科学与技术（可...
- 6 篇 建筑学
- 6 篇 交通运输工程
- 5 篇 仪器科学与技术
- 5 篇 化学工程与技术
- 5 篇 安全科学与工程
- 4 篇 土木工程
3,347 篇 医学
- 3,346 篇 临床医学
- 4 篇 基础医学(可授医学...
- 4 篇 公共卫生与预防医...
253 篇 理学
- 198 篇 系统科学
- 32 篇 物理学
- 21 篇 生物学
- 18 篇 数学
- 9 篇 统计学（可授理学、...
- 7 篇 化学
17 篇 管理学
- 12 篇 管理科学与工程(可...
- 7 篇 图书情报与档案管...
- 5 篇 工商管理
3 篇 法学
- 3 篇 社会学
3 篇 教育学
- 3 篇 教育学
2 篇 农学
1 篇 经济学
1 篇 军事学

主题

5,633 篇 computer vision
2,668 篇 training
2,203 篇 pattern recognit...
1,747 篇 computational mo...
1,502 篇 visualization
1,360 篇 three-dimensiona...
1,074 篇 semantics
999 篇 benchmark testin...
986 篇 codes
959 篇 computer archite...
891 篇 deep learning
777 篇 conferences
754 篇 task analysis
700 篇 feature extracti...
561 篇 transformers
533 篇 face recognition
527 篇 neural networks
495 篇 object detection
490 篇 image segmentati...
468 篇 cameras

机构

174 篇 univ sci & techn...
145 篇 carnegie mellon ...
144 篇 univ chinese aca...
144 篇 tsinghua univ pe...
134 篇 chinese univ hon...
110 篇 zhejiang univ pe...
109 篇 peng cheng lab p...
99 篇 swiss fed inst t...
91 篇 tsinghua univers...
90 篇 shanghai ai lab ...
87 篇 sensetime res pe...
86 篇 shanghai jiao to...
83 篇 zhejiang univers...
82 篇 tech univ munich...
79 篇 university of sc...
79 篇 stanford univ st...
78 篇 univ hong kong p...
77 篇 australian natl ...
76 篇 alibaba grp peop...
75 篇 peng cheng labor...

作者

75 篇 timofte radu
64 篇 van gool luc
50 篇 zhang lei
43 篇 yang yi
37 篇 loy chen change
36 篇 tao dacheng
32 篇 zhou jie
31 篇 chen chen
30 篇 liu yang
30 篇 tian qi
29 篇 sun jian
29 篇 zha zheng-jun
28 篇 li xin
27 篇 qi tian
26 篇 vasconcelos nuno
25 篇 liu xiaoming
25 篇 darrell trevor
24 篇 zheng wei-shi
24 篇 luo ping
24 篇 ying shan

语言

11,863 篇 英文
26 篇 其他
1 篇 中文

检索条件"任意字段=2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024"

共 11890 条记录，以下是481-490 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Zero-Shot Audio-Visual Compound Expression recognition Method based on Emotion Probability Fusion

Zero-Shot Audio-Visual Compound Expression Recognition Metho...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Ryumina, Elena Markitantov, Maxim Ryumin, Dmitry Kaya, Heysem Karpov, Alexey Russian Acad Sci St Petersburg Fed Res Ctr St Petersburg Russia Univ Utrecht Dept Informat & Comp Sci Utrecht Netherlands

ISBN: (纸本)9798350365474

A Compound Expression recognition (CER) as a sub-field of affective computing is a novel task in intelligent human-computer interaction and multimodal user interfaces. We propose a novel audio-visual method for CER. Our method relies on emotion recognition models that fuse modalities at the emotion probability level, while decisions regarding the prediction of compound expressions are based on the pair-wise sum of weighted emotion probability distributions. Notably, our method does not use any training data specific to the target task. Thus, the problem is a zero-shot classification task. The method is evaluated in multi-corpus training and cross-corpus validation setups. We achieved F1 scores of 32.15% and 25.56% for the AffWild2 and C-EXPR-DB test subsets without training on target corpus and target task, respectively. Therefore, our method is on par with methods developed training target corpus or target task. The source code is publicly available from https: //***/AVCER/.

关键词： audio-visual emotion recognition compound expression recognition zero-shot classification

来源：评论

学校读者我要写书评

暂无评论

Three Pillars improving vision Foundation Model Distillation for Lidar

Three Pillars improving Vision Foundation Model Distillation...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Puy, Gilles Gidaris, Spyros Boulch, Alexandre Simeoni, Oriane Sautier, Corentin Perez, Patrick Bursucl, Andrei Marlet, Renaud Valeo ai Paris France Kyutai Paris France Univ Gustave Eiffel CNRS LIGM Ecole Ponts Marne La Vallee France

ISBN: (纸本)9798350353006

Self-supervised image backbones can be used to address complex 2D tasks (e.g., semantic segmentation, object discovery) very efficiently and with little or no downstream supervision. Ideally, 3D backbones for lidar should be able to inherit these properties after distillation of these powerful 2D features. The most recent methods for image-to-lidar distillation on autonomous driving data show promising results, obtained thanks to distillation methods that keep improving. Yet, we still notice a large performance gap when measuring by linear probing the quality of distilled vs fully supervised features. In this work, instead of focusing only on the distillation method, we study the effect of three pillars for distillation: the 3D backbone, the pretrained 2D backbone, and the pretraining 2D+3D dataset. In particular, thanks to our scalable distillation method named ScaLR, we show that scaling the 2D and 3D backbones and pretraining on diverse datasets leads to a substantial improvement of the feature quality. This allows us to significantly reduce the gap between the quality of distilled and fully-supervised 3D features, and to improve the robustness of the pretrained backbones to domain gaps and perturbations. The code is available at https://***/valeoai/ScaLR.

关键词： Semantic Segmentation

来源：评论

学校读者我要写书评

暂无评论

LAKE-RED: Camouflaged Images Generation by Latent Background Knowledge Retrieval-Augmented Diffusion

LAKE-RED: Camouflaged Images Generation by Latent Background...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Zhao, Pancheng Xu, Peng Qin, Pengda Fan, Deng-Ping Zhang, Zhicheng Jia, Guoli Zhou, Bowen Yang, Jufeng Nankai Univ Coll Comp Sci VCIP Tianjin Peoples R China Nankai Univ Coll Comp Sci TMCC Tianjin Peoples R China Nankai Univ Coll Comp Sci DISSec Tianjin Peoples R China Nankai Int Adv Res Inst Shenzhen Peoples R China Tsinghua Univ Dept Elect Engn Beijing Peoples R China Alibaba Grp Hangzhou Peoples R China

ISBN: (纸本)9798350353013;9798350353006

Camouflaged vision perception is an important vision task with numerous practical applications. Due to the expensive collection and labeling costs, this community struggles with a major bottleneck that the species category of its datasets is limited to a small number of object species. However, the existing camouflaged generation methods require specifying the background manually, thus failing to extend the camouflaged sample diversity in a low-cost manner. In this paper, we propose a Latent Background Knowledge Retrieval-Augmented Diffusion (LAKE-RED) for camouflaged image generation. To our knowledge, our contributions mainly include: (1) For the first time, we propose a camouflaged generation paradigm that does not need to receive any background inputs. (2) Our LAKE-RED is the first knowledge retrieval-augmented method with interpretability for camouflaged generation, in which we propose an idea that knowledge retrieval and reasoning enhancement are separated explicitly, to alleviate the task-specific challenges. Moreover, our method is not restricted to specific foreground targets or backgrounds, offering a potential for extending camouflaged vision perception to more diverse domains. (3) Experimental results demonstrate that our method outperforms the existing approaches, generating more realistic camouflage images. Our source code is released on https://***/PanchengZhao/LAKE-RED.

关键词： camouflage object detection camouflaged vision perception diffusion generative model knowledge retrieval reasoning enhancement

来源：评论

学校读者我要写书评

暂无评论

ES³: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations

ES<SUP>3</SUP>: Evolving Self-Supervised Learning of Robust ...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Zhang, Yuanhang Yang, Shuang Shan, Shiguang Chen, Xilin Chinese Acad Sci Inst Comp Technol Key Lab Intelligent Informat Proc Beijing 100190 Peoples R China Univ Chinese Acad Sci Beijing 100049 Peoples R China

ISBN: (纸本)9798350353006

We propose a novel strategy, ES3, for self-supervised learning of robust audio-visual speech representations from unlabeled talking face videos. While many recent approaches for this task primarily rely on guiding the learning process using the audio modality alone to capture information shared between audio and video, we reframe the problem as the acquisition of shared, unique (modality-specific) and synergistic speech information to address the inherent asymmetry between the modalities. Based on this formulation, we propose a novel "evolving" strategy that progressively builds joint audio-visual speech representations that are strong for both uni-modal (audio & visual) and bi-modal (audio-visual) speech. First, we leverage the more easily learnable audio modality to initialize audio and visual representations by capturing audio-unique and shared speech information. Next, we incorporate video-unique speech information and bootstrap the audio-visual representations on top of the previously acquired shared knowledge. Finally, we maximize the total audio-visual speech information, including synergistic information to obtain robust and comprehensive representations. We implement ES3 as a simple Siamese framework and experiments on both English benchmarks and a newly contributed large-scale Mandarin dataset show its effectiveness. In particular, on LRS2-BBC, our smallest model is on par with SoTA models with only 1/2 parameters and 1/8 unlabeled data (223h).

关键词： audio-visual speech representations multi-modal learning representation learning self-supervised learning speech recognition

来源：评论

学校读者我要写书评

暂无评论

Open3DSG: Open-Vocabulary 3D Scene Graphs from Point Clouds with Queryable Objects and Open-Set Relationships

Open3DSG: Open-Vocabulary 3D Scene Graphs from Point Clouds ...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Koch, Sebastian Vaskevicius, Narunas Colosi, Mirco Hermosilla, Pedro Ropinski, Timo Bosch Ctr Artificial Intelligence Stuttgart Germany Robert Bosch Corp Res Stuttgart Germany Univ Ulm Ulm Germany TU Vienna Vienna Austria

ISBN: (纸本)9798350353006

Current approaches for 3D scene graph prediction rely on labeled datasets to train models for a fixed set of known object classes and relationship categories. We present Open3DSG, an alternative approach to learn 3D scene graph prediction in an open world without requiring labeled scene graph data. We co-embed the features from a 3D scene graph prediction backbone with the feature space of powerful open world 2D vision language foundation models. This enables us to predict 3D scene graphs from 3D point clouds in a zero-shot manner by querying object classes from an open vocabulary and predicting the inter-object relationships from a grounded LLM with scene graph features and queried object classes as context. Open3DSG is the first 3D point cloud method to predict not only explicit open-vocabulary object classes, but also open-set relationships that are not limited to a predefined label set, making it possible to express rare as well as specific objects and relationships in the predicted 3D scene graph. Our experiments show that Open3DSG is effective at predicting arbitrary object classes as well as their complex inter-object relationships describing spatial, supportive, semantic and comparative relationships.

关键词： 3d scene graphs 3d scene representation 3d scene understanding open-vocabulary open-world point cloud scene graph vision-language vlm

来源：评论

学校读者我要写书评

暂无评论

OpenEQA: Embodied Question Answering in the Era of Foundation Models

OpenEQA: Embodied Question Answering in the Era of Foundatio...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Majumdar, Arjun Ajay, Anurag Zhang, Xi Aohan Punya, Pranav Yenamandra, Sriram Henaff, Mikael Silwal, Sneha Mcvay, Paul Maksymets, Oleksandr Arnaud, Sergio Yadav, Karmesh Li, Qiyang Newman, Ben Sharma, Mohit Berges, Vincent Zhang, Shiqi Agrawal, Pulkit Bisk, Yonatan Batra, Dhruv Kalakrishnan, Mrinal Meier, Franziska Paxton, Chris Sax, Alexander Rajeswaran, Aravind Georgia Tech Atlanta GA 30332 USA MIT 77 Massachusetts Ave Cambridge MA 02139 USA SUNY Binghamton Binghamton NY USA Meta AI Menlo Pk CA USA Univ Calif Berkeley Berkeley CA USA CMU Pittsburgh PA USA Meta Fundamental AI Res FAIR Menlo Pk CA USA

ISBN: (纸本)9798350353006

We present a modern formulation of Embodied Question Answering (EQA) as the task of understanding an environment well enough to answer questions about it in natural language. An agent can achieve such an understanding by either drawing upon episodic memory, exemplified by agents on smart glasses, or by actively exploring the environment, as in the case of mobile robots. We accompany our formulation with OpenEQA - the first open-vocabulary benchmark dataset for EQA supporting both episodic memory and active exploration use cases. OpenEQA contains over 1600 high-quality human generated questions drawn from over 180 real-world environments. In addition to the dataset, we also provide an automatic LLM-powered evaluation protocol that has excellent correlation with human judgement. Using this dataset and evaluation protocol, we evaluate several state-of-the-art foundation models including GPT-4V, and find that they significantly lag behind human-level performance. Consequently, OpenEQA stands out as a straightforward, measurable, and practically relevant benchmark that poses a considerable challenge to current generation of foundation models. We hope this inspires and stimulates future research at the intersection of Embodied AI, conversational agents, and world models.

关键词： Embodied AI Embodied Question Answering vision-Language Models

来源：评论

学校读者我要写书评

暂无评论

360Loc: A Dataset and Benchmark for Omnidirectional Visual Localization with Cross-device Queries

360Loc: A Dataset and Benchmark for Omnidirectional Visual L...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Huang, Huajian Liu, Changkun Zhu, Yipeng Cheng, Hui Braud, Tristan Yeung, Sai-Kit Hong Kong Univ Sci & Technol Hong Kong Peoples R China Sun Yat Sen Univ Guangzhou Peoples R China

ISBN: (纸本)9798350353006

Portable 360 degrees cameras are becoming a cheap and efficient tool to establish large visual databases. By capturing omnidirectional views of a scene, these cameras could expedite building environment models that are essential for visual localization. However, such an advantage is often overlooked due to the lack of valuable datasets. This paper introduces a new benchmark dataset, 360Loc, composed of 360 degrees images with ground truth poses for visual localization. We present a practical implementation of 360 degrees mapping combining 360 degrees images with lidar data to generate the ground truth 6DoF poses. 360Loc is the first dataset and benchmark that explores the challenge of cross-device visual positioning, involving 360 degrees reference frames, and query frames from pinhole, ultra-wide FoV fisheye, and 360 degrees cameras. We propose a virtual camera approach to generate lower-FoV query frames from 360 degrees images, which ensures a fair comparison of performance among different query types in visual localization tasks. We also extend this virtual camera approach to feature matching-based and pose regression-based methods to alleviate the performance loss caused by the cross-device domain gap, and evaluate its effectiveness against state-of-the-art baselines. We demonstrate that omnidirectional visual localization is more robust in challenging large-scale scenes with symmetries and repetitive structures. These results provide new insights into 360-camera mapping and omnidirectional visual localization with cross-device queries. Project Page and dataset: https://***/research/360Loc/.

关键词： omnidirectional vision visual lozalization

来源：评论

学校读者我要写书评

暂无评论

Prompt-Enhanced Multiple Instance Learning for Weakly Supervised Video Anomaly Detection

Prompt-Enhanced Multiple Instance Learning for Weakly Superv...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Chen, Junxi Li, Liang Su, Li Zha, Zheng-Jun Huang, Qingming Univ Chinese Acad Sci Beijing Peoples R China Chinese Acad Sci Key Lab Intell Info Proc ICT Beijing Peoples R China Peng Cheng Lab Shenzhen Peoples R China Univ Sci & Technol China Hefei Peoples R China Chinese Acad Sci Key Lab Safety Beijing Peoples R China

ISBN: (纸本)9798350353006

Weakly-supervised Video Anomaly Detection (wVAD) aims to detect frame-level anomalies using only video-level labels in training. Due to the limitation of coarse-grained labels, Multi-Instance Learning (MIL) is prevailing in wVAD. However, MIL suffers from insufficiency of binary supervision to model diverse abnormal patterns. Besides, the coupling between abnormality and its context hinders the learning of clear abnormal event boundary. In this paper, we propose prompt-enhanced MIL to detect various abnormal events while ensuring clear event boundaries. Concretely, we design the abnormal-aware prompts by using abnormal class annotations together with learnable prompt, which can incorporate semantic priors into video features dynamically. The detector can utilize the semantic-rich features to capture diverse abnormal patterns. In addition, normal context prompt is introduced to amplify the distinction between abnormality and its context, facilitating the generation of clear boundary. With the mutual enhancement of abnormal-aware and normal context prompt, the model can construct discriminative representations to detect divergent anomalies without ambiguous event boundaries. Extensive experiments demonstrate our method achieves SOTA performance on three public bench-marks. The code is available at https://***/Junxi-Chen/PE-MIL.

关键词： computer vision Video Anomaly Detection

来源：评论

学校读者我要写书评

暂无评论

Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

Panda-70M: Captioning 70M Videos with Multiple Cross-Modalit...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Chen, Tsai-Shien Siarohin, Aliaksandr Menapace, Willi Deyneka, Ekaterina Chao, Hsiang-wei Jeon, Byung Eun Fang, Yuwei Lee, Hsin-Ying Ren, Jian Yang, Ming-Hsuan Tulyakov, Sergey Snap Inc Santa Monica CA 90405 USA Univ Calif Merced Merced CA 95343 USA Univ Trento Trento Italy Snap Santa Monica CA USA

ISBN: (纸本)9798350353006

The quality of the data and annotation upper-bounds the quality of a downstream model. While there exist large text corpora and image-text pairs, high-quality video-text data is much harder to collect. First of all, manual labeling is more time-consuming, as it requires an annotator to watch an entire video. Second, videos have a temporal dimension, consisting of several scenes stacked together, and showing multiple actions. Accordingly, to establish a video dataset with high- quality captions, we propose an automatic approach leveraging multimodal inputs, such as textual video description, subtitles, and individual video frames. Specifically, we curate 3.8M high-resolution videos from the publicly available HD-VILA-100M dataset. We then split them into semantically consistent video clips, and apply multiple cross-modality teacher models to obtain captions for each video. Next, we finetune a retrieval model on a small subset where the best caption of each video is manually selected and then employ the model in the whole dataset to select the best caption as the annotation. In this way, we get 70M videos paired with high-quality text captions. We dub the dataset as Panda-70M. We show the value of the proposed dataset on three downstream tasks: video captioning, video and text retrieval, and text-driven video generation. The models trained on the proposed data score substantially better on the majority of metrics across all the tasks.

关键词： Multimodal learning Video captioning vision-language dataset

来源：评论

学校读者我要写书评

暂无评论

3DGStream: On-the-Fly Training of 3D Gaussians for Efficient Streaming of Photo-Realistic Free-Viewpoint Videos

3DGStream: On-the-Fly Training of 3D Gaussians for Efficient...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Sun, Jiakai Jiao, Han Li, Guangyuan Zhang, Zhanjie Zhao, Lei Xing, Wei Zhejiang Univ Hangzhou Peoples R China

ISBN: (纸本)9798350353006

Constructing photo-realistic Free-Viewpoint Videos ( FVVs) of dynamic scenes from multi-view videos remains a challenging endeavor. Despite the remarkable advance-ments achieved by current neural rendering techniques, these methods generally require complete video sequences for offline training and are not capable of real-time render-ing. To address these constraints, we introduce 3DGStream, a method designed for efficient FVV streaming of real-world dynamic scenes. Our method achieves fast on-the-fly per-frame reconstruction within 12 seconds and real-time rendering at 200 FPS. Specifically, we utilize 3D Gaussians ( 3DGs) to represent the scene. Instead of the naive ap-proach of directly optimizing 3DGs per-frame, we employ a compact Neural Transformation Cache (NTC) to model the translations and rotations of 3DGs, markedly reducing the training time and storage required for each FVV frame. Furthermore, we propose an adaptive 3DG addition strat-egy to handle emerging objects in dynamic scenes. Exper-iments demonstrate that 3DGStream achieves competitive performance in terms of rendering speed, image quality, training time, and model storage when compared with state-of-the-art methods.

关键词： 3D Gaussian Splatting 3D vision Dynamic Scene Reconstruction Free-Viewpoint Video Streaming Media

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 45 46 47 48 49 50 51 52 53 54 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：