检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

11,884 篇 会议
5 篇 期刊文献

馆藏范围

11,889 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

8,055 篇 工学
- 7,613 篇 计算机科学与技术...
- 796 篇 机械工程
- 688 篇 电气工程
- 356 篇 软件工程
- 225 篇 控制科学与工程
- 40 篇 光学工程
- 19 篇 生物工程
- 17 篇 信息与通信工程
- 12 篇 生物医学工程（可授...
- 6 篇 电子科学与技术（可...
- 6 篇 建筑学
- 6 篇 交通运输工程
- 5 篇 仪器科学与技术
- 5 篇 化学工程与技术
- 5 篇 安全科学与工程
- 4 篇 土木工程
3,344 篇 医学
- 3,343 篇 临床医学
- 4 篇 基础医学(可授医学...
- 4 篇 公共卫生与预防医...
250 篇 理学
- 198 篇 系统科学
- 29 篇 物理学
- 21 篇 生物学
- 15 篇 数学
- 9 篇 统计学（可授理学、...
- 4 篇 化学
17 篇 管理学
- 12 篇 管理科学与工程(可...
- 7 篇 图书情报与档案管...
- 5 篇 工商管理
3 篇 法学
- 3 篇 社会学
3 篇 教育学
- 3 篇 教育学
2 篇 农学
1 篇 经济学
1 篇 军事学

主题

5,633 篇 computer vision
2,668 篇 training
2,203 篇 pattern recognit...
1,747 篇 computational mo...
1,502 篇 visualization
1,360 篇 three-dimensiona...
1,074 篇 semantics
999 篇 benchmark testin...
986 篇 codes
959 篇 computer archite...
891 篇 deep learning
777 篇 conferences
754 篇 task analysis
700 篇 feature extracti...
561 篇 transformers
533 篇 face recognition
527 篇 neural networks
495 篇 object detection
490 篇 image segmentati...
468 篇 cameras

机构

174 篇 univ sci & techn...
145 篇 carnegie mellon ...
144 篇 univ chinese aca...
144 篇 tsinghua univ pe...
134 篇 chinese univ hon...
110 篇 zhejiang univ pe...
109 篇 peng cheng lab p...
99 篇 swiss fed inst t...
91 篇 tsinghua univers...
90 篇 shanghai ai lab ...
87 篇 sensetime res pe...
86 篇 shanghai jiao to...
83 篇 zhejiang univers...
82 篇 tech univ munich...
79 篇 university of sc...
79 篇 stanford univ st...
78 篇 univ hong kong p...
77 篇 australian natl ...
76 篇 alibaba grp peop...
75 篇 peng cheng labor...

作者

75 篇 timofte radu
64 篇 van gool luc
50 篇 zhang lei
43 篇 yang yi
37 篇 loy chen change
36 篇 tao dacheng
32 篇 zhou jie
31 篇 chen chen
30 篇 liu yang
30 篇 tian qi
29 篇 sun jian
29 篇 zha zheng-jun
28 篇 li xin
27 篇 qi tian
26 篇 vasconcelos nuno
25 篇 liu xiaoming
25 篇 darrell trevor
24 篇 zheng wei-shi
24 篇 luo ping
24 篇 ying shan

语言

11,863 篇 英文
25 篇 其他
1 篇 中文

检索条件"任意字段=2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024"

共 11889 条记录，以下是151-160 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Improving Image Restoration through Removing Degradations in Textual Representations

Improving Image Restoration through Removing Degradations in...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Lin, Jingbo Zhang, Zhilu Wei, Yuxiang Ren, Dongwei Jiang, Dongsheng Tian, Qi Zuo, Wangmeng Harbin Inst Technol Harbin Peoples R China Huawei Cloud Comp Co Ltd Shenzhen Peoples R China

ISBN: (纸本)9798350353013;9798350353006

In this paper, we introduce a new perspective for improving image restoration by removing degradation in the textual representations of a given degraded image. Intuitively, restoration is much easier on text modality than image one. For example, it can be easily conducted by removing degradation-related words while keeping the content-aware words. Hence, we combine the advantages of images in detail description and ones of text in degradation removal to perform restoration. To address the cross-modal assistance, we propose to map the degraded images into textual representations for removing the degradations, and then convert the restored textual representations into a guidance image for assisting image restoration. In particular, We ingeniously embed an image-to-text mapper and text restoration module into CLIP-equipped text-to-image models to generate the guidance. Then, we adopt a simple coarse-to-fine approach to dynamically inject multi-scale information from guidance to image restoration networks. Extensive experiments are conducted on various image restoration tasks, including deblurring, dehazing, deraining, and denoising, and all-in-one image restoration. The results showcase that our method outperforms state-of-the-art ones across all these tasks. The codes and models are available at https://***/mrluin/TextualDegRemoval.

关键词： image restoration low-level vision

来源：评论

学校读者我要写书评

暂无评论

Unsupervised Video Domain Adaptation with Masked Pre-Training and Collaborative Self-Training

Unsupervised Video Domain Adaptation with Masked Pre-Trainin...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Reddy, Arun Paul, William Rivera, Corban Shah, Ketul de Melo, Celso M. Chellappa, Rama Johns Hopkins Univ Baltimore MD 21218 USA Johns Hopkins Univ Dept Elect & Comp Engn Baltimore MD USA DEVCOM US Army Res Lab Aberdeen Proving Ground MD USA

ISBN: (纸本)9798350353006

In this work, we tackle the problem of unsupervised domain adaptation (UDA) for video action recognition. Our approach, which we call UNITE, uses an image teacher model to adapt a video student model to the target domain. UNITE first employs self-supervised pre-training to promote discriminative feature learning on target domain videos using a teacher-guided masked distillation objective. We then perform self-training on masked target data, using the video student model and image teacher model together to generate improved pseudolabels for unlabeled target videos. Our self-training process successfully leverages the strengths of both models to achieve strong transfer performance across domains. We evaluate our approach on multiple video domain adaptation benchmarks and observe significant improvements upon previously reported results.

关键词： action recognition domain adaptation masked modeling vuda

来源：评论

学校读者我要写书评

暂无评论

Joint Physical-Digital Facial Attack Detection Via Simulating Spoofing Clues

Joint Physical-Digital Facial Attack Detection Via Simulatin...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： He, Xianhua Liang, Dashuang Yang, Song Hao, Zhanlong Ma, Hui Mao, Binjie Li, Xi Wang, Yao Yan, Pengfei Liu, Ajian Meituan Vis AI Dept Beijing Peoples R China MUST Taipa Macao Peoples R China CASIA MAIS Beijing Peoples R China

ISBN: (纸本)9798350365474

Face recognition systems are frequently subjected to a variety of physical and digital attacks of different types. Previous methods have achieved satisfactory performance in scenarios that address physical attacks and digital attacks, respectively. However, few methods are considered to integrate a model that simultaneously addresses both physical and digital attacks, implying the necessity to develop and maintain multiple models. To jointly detect physical and digital attacks within a single model, we propose an innovative approach that can adapt to any network architecture. Our approach mainly contains two types of data augmentation, which we call Simulated Physical Spoofing Clues augmentation (SPSC) and Simulated Digital Spoofing Clues augmentation (SDSC). SPSC and SDSC augment live samples into simulated attack samples by simulating spoofing clues of physical and digital attacks, respectively, which significantly improve the capability of the model to detect "unseen" attack types. Extensive experiments show that SPSC and SDSC can achieve state-of-the-art generalization in Protocols 2.1 and 2.2 of the UniAttackData dataset, respectively. Our method won first place in "Unified Physical-Digital Face Attack Detection" of the 5th Face Anti-spoofing Challenge@cvpr2024. Our final submission obtains 3.75% APCER, 0.93% BPCER, and 2.34% ACER, respectively. Our code is available at https: // ***/ Xianhua- He/ cvpr2024- faceanti-spoofing- challenge.

关键词： Face recognition

来源：评论

学校读者我要写书评

暂无评论

SleepVST: Sleep Staging from Near-Infrared Video Signals using Pre-Trained Transformers

SleepVST: Sleep Staging from Near-Infrared Video Signals usi...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Carter, Jonathan F. Jorge, Joao Gibson, Oliver Tarassenkol, Lionel Univ Oxford Inst Biomed Engn Oxford England Oxehealth Ltd Oxford England

ISBN: (纸本)9798350353006

Advances in camera-based physiological monitoring have enabled the robust, non-contact measurement of respiration and the cardiac pulse, which are known to be indicative of the sleep stage. This has led to research into camera-based sleep monitoring as a promising alternative to "gold-standard" polysomnography, which is cumbersome, expensive to administer, and hence unsuitable for longer-term clinical studies. In this paper, we introduce SleepVST, a transformer model which enables state-of-the-art performance in camera-based sleep stage classification (sleep staging). After pre-training on contact sensor data, SleepVST outperforms existing methods for cardio-respiratory sleep staging on the SHHS and MESA datasets, achieving total Cohen's kappa scores of 0.75 and 0.77 respectively. We then show that SleepVST can be successfully transferred to cardio-respiratory waveforms extracted from video, enabling fully contact-free sleep staging. Using a video dataset of 50 nights, we achieve a total accuracy of 78.8% and a Cohen's. of 0.71 in four-class video-based sleep staging, setting a new state-of-the-art in the domain.

关键词： computer vision remote monitoring sleep staging transformers

来源：评论

学校读者我要写书评

暂无评论

Do vision and Language Encoders Represent the World Similarly?

Do Vision and Language Encoders Represent the World Similarl...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Maniparambil, Mayug Akshulakov, Raiymbek Djilali, Yasser Abdelaziz Dahou Seddik, Mohamed El Amine Narayan, Sanath Mangalam, Karttikeya O'Connor, Noel E. Dublin City Univ ML Labs Dublin Ireland Univ Calif Berkeley Berkeley CA 94720 USA Technol Innovat Inst Dublin Ireland

ISBN: (纸本)9798350353006

Aligned text-image encoders such as CLIP have become the de-facto model for vision-language tasks. Furthermore, modality-specific encoders achieve impressive performances in their respective domains. This raises a central question: does an alignment exist between uni-modal vision and language encoders since they fundamentally represent the same physical world? Analyzing the latent spaces structure of vision and language models on image-caption benchmarks using the Centered Kernel Alignment (CKA), we find that the representation spaces of unaligned and aligned encoders are semantically similar. In the absence of statistical similarity in aligned encoders like CLIP, we show that a possible matching of unaligned encoders exists without any training. We frame this as a seeded graph-matching problem exploiting the semantic similarity between graphs and propose two methods - a Fast Quadratic Assignment Problem optimization, and a novel localized CKA metric-based matching/retrieval. We demonstrate the effectiveness of this on several downstream tasks including cross-lingual, cross-domain caption matching and image classification. Code available at ***/mayug/0-shot-llm-vision.

关键词： CLIP Unified Representations vision Language Zero-shot

来源：评论

学校读者我要写书评

暂无评论

WALT3D: Generating Realistic Training Data from Time-Lapse Imagery for Reconstructing Dynamic Objects under Occlusion

WALT3D: Generating Realistic Training Data from Time-Lapse I...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Khiem Vuong Reddy, N. Dinesh Tamburo, Robert Narasimhan, Srinivasa G. Carnegie Mellon Univ Pittsburgh PA 15213 USA Amazon Seattle WA USA

ISBN: (纸本)9798350353006

Current methods for 2D and 3D object understanding struggle with severe occlusions in busy urban environments, partly due to the lack of large-scale labeled groundtruth annotations for learning occlusion. In this work, we introduce a novel framework for automatically generating a large, realistic dataset of dynamic objects under occlusions using freely available time-lapse imagery. By leveraging off-the-shelf 2D (bounding box, segmentation, keypoint) and 3D (pose, shape) predictions as pseudo-groundtruth, unoccluded 3D objects are identified automatically and composited into the background in a clip-art style, ensuring realistic appearances and physically accurate occlusion configurations. The resulting clip-art image with pseudo-groundtruth enables efficient training of object reconstruction methods that are robust to occlusions. Our method demonstrates significant improvements in both 2D and 3D reconstruction, particularly in scenarios with heavily occluded objects like vehicles and people in urban scenes.

关键词： 3D from single images computer vision

来源：评论

学校读者我要写书评

暂无评论

Cross-view and Cross-pose Completion for 3D Human Understanding

Cross-view and Cross-pose Completion for 3D Human Understand...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Armando, Matthieu Galaaoui, Salma Baradel, Fabien Lucas, Thomas Leroy, Vincent Bregier, Romain Weinzaepfel, Philippe Rogez, Gregory NAVER LABS Europe Meylan France

ISBN: (纸本)9798350353013;9798350353006

Human perception and understanding is a major domain of computer vision which, like many other vision subdomains recently, stands to gain from the use of large models pre-trained on large datasets. We hypothesize that the most common pre-training strategy of relying on general purpose, object-centric image datasets such as ImageNet, is limited by an important domain shift. On the other hand, collecting domain-specific ground truth such as 2D or 3D labels does not scale well. Therefore, we propose a pre-training approach based on self-supervised learning that works on human-centric data using only images. Our method uses pairs of images of humans: the first is partially masked and the model is trained to reconstruct the masked parts given the visible ones and a second image. It relies on both stereoscopic (cross-view) pairs, and temporal (cross-pose) pairs taken from videos, in order to learn priors about 3D as well as human motion. We pre-train a model for body-centric tasks and one for hand-centric tasks. With a generic transformer architecture, these models outperform existing self-supervised pre-training methods on a wide set of human-centric downstream tasks, and obtain state-of-the-art performance for instance when fine-tuning for model-based and model-free human mesh recovery.

关键词： hand-centric human-centric pretraining representation learning vision tranformer

来源：评论

学校读者我要写书评

暂无评论

Probing the 3D Awareness of Visual Foundation Models

Probing the 3D Awareness of Visual Foundation Models

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： El Banani, Mohamed Raj, Amit Maninis, Kevis-Kokitsi Kar, Abhishek Li, Yuanzhen Rubinstein, Michael Sun, Deqing Guibas, Leonidas Johnson, Justin Jampani, Varun Univ Michigan Ann Arbor MI 48109 USA Google Mountain View CA 94043 USA Stability AI London ON Canada

ISBN: (纸本)9798350353006

Recent advances in large-scale pretraining have yielded visual foundation models with strong capabilities. Not only can recent models generalize to arbitrary images for their training task, their intermediate representations are useful for other visual tasks such as detection and segmentation. Given that such models can classify, delineate, and localize objects in 2D, we ask whether they also represent their 3D structure? In this work, we analyze the 3D awareness of visual foundation models. We posit that 3D awareness implies that representations (1) encode the 3D structure of the scene and (2) consistently represent the surface across views. We conduct a series of experiments using task-specific probes and zero-shot inference procedures on frozen features. Our experiments reveal several limitations of the current models. Our code and analysis can be found at https://***/mbanani/probe3d.

关键词： 3D Awareness 3D vision Foundation Models Representation Learning

来源：评论

学校读者我要写书评

暂无评论

Grounding Everything: Emerging Localization Properties in vision-Language Transformers

Grounding Everything: Emerging Localization Properties in Vi...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Bousselham, Walid Petersen, Felix Ferrari, Vittorio Kuehne, Hilde Univ Bonn Bonn Germany Goethe Univ Frankfurt Frankfurt Germany Stanford Univ Stanford CA 94305 USA Synthesia Io London England MIT IBM Watson AI Lab Cambridge MA USA

ISBN: (纸本)9798350353013;9798350353006

vision-language foundation models have shown remarkable performance in various zero-shot settings such as image retrieval, classification, or captioning. But so far, those models seem to fall behind when it comes to zero-shot localization of referential expressions and objects in images. As a result, they need to be fine-tuned for this task. In this paper, we show that pretrained vision-language (VL) models allow for zero-shot open-vocabulary object localization without any fine-tuning. To leverage those capabilities, we propose a Grounding Everything Module (GEM) that generalizes the idea of value-value attention introduced by CLIPSurgery [17] to a self-self attention path. We show that the concept of self-self attention corresponds to clustering, thus enforcing groups of tokens arising from the same object to be similar while preserving the alignment with the language space. To further guide the group formation, we propose a set of regularizations that allows the model to finally generalize across datasets and backbones. We evaluate the proposed GEM framework on various benchmark tasks and datasets for semantic segmentation. GEM not only outperforms other training-free open-vocabulary localization methods, but also achieves state-of-the-art results on the recently proposed OpenImagesV7 large-scale segmentation benchmark. (1)

关键词： CLIP open-vocabulary zero-shot segmentation vision-language model

来源：评论

学校读者我要写书评

暂无评论

A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives

A Backpack Full of Skills: Egocentric Video Understanding wi...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Peirone, Simone Alberto Pistilli, Francesca Alliegro, Antonio Averta, Giuseppe Politecnico Torino Turin Italy Ist Italiano Tecnol Genoa Italy

ISBN: (纸本)9798350353006

Human comprehension of a video stream is naturally broad: in a few instants, we are able to understand what is happening, the relevance and relationship of objects, and forecast what will follow in the near future, everything all at once. We believe that - to effectively transfer such an holistic perception to intelligent machines - an important role is played by learning to correlate concepts and to abstract knowledge coming from different tasks, to synergistically exploit them when learning novel skills. To accomplish this, we look for a unified approach to video understanding which combines shared temporal modelling of human actions with minimal overhead, to support multiple down-stream tasks and enable cooperation when learning novel skills. We then propose EgoPack, a solution that creates a collection of task perspectives that can be carried across downstream tasks and used as a potential source of additional insights, as a backpack of skills that a robot can carry around and use when needed. We demonstrate the effectiveness and efficiency of our approach on four Ego4D benchmarks, outperforming current state-of-the-art methods. Project webpage: ***/EgoPack.

关键词： Egocentric vision Video Understanding

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 12 13 14 15 16 17 18 19 20 21 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：