检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

20,860 篇 会议
105 篇 期刊文献
43 册 图书

馆藏范围

21,007 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

13,620 篇 工学
- 11,056 篇 计算机科学与技术...
- 2,652 篇 机械工程
- 2,252 篇 软件工程
- 914 篇 光学工程
- 885 篇 电气工程
- 529 篇 控制科学与工程
- 477 篇 信息与通信工程
- 216 篇 测绘科学与技术
- 135 篇 生物工程
- 127 篇 生物医学工程（可授...
- 98 篇 电子科学与技术（可...
- 92 篇 仪器科学与技术
- 46 篇 安全科学与工程
- 40 篇 建筑学
- 40 篇 化学工程与技术
- 39 篇 土木工程
- 37 篇 交通运输工程
- 35 篇 力学（可授工学、理...
- 33 篇 航空宇航科学与技...
3,494 篇 医学
- 3,489 篇 临床医学
- 32 篇 基础医学(可授医学...
2,247 篇 理学
- 1,145 篇 物理学
- 1,081 篇 数学
- 401 篇 生物学
- 384 篇 统计学（可授理学、...
- 245 篇 系统科学
- 46 篇 化学
343 篇 管理学
- 176 篇 管理科学与工程(可...
- 168 篇 图书情报与档案管...
- 34 篇 工商管理
31 篇 法学
19 篇 农学
15 篇 教育学
8 篇 经济学
5 篇 艺术学
2 篇 军事学
1 篇 文学

主题

8,141 篇 computer vision
2,886 篇 training
2,841 篇 pattern recognit...
1,809 篇 computational mo...
1,715 篇 visualization
1,493 篇 cameras
1,433 篇 three-dimensiona...
1,433 篇 feature extracti...
1,366 篇 shape
1,360 篇 face recognition
1,243 篇 image segmentati...
1,135 篇 robustness
1,124 篇 semantics
992 篇 computer archite...
985 篇 object detection
982 篇 layout
959 篇 benchmark testin...
935 篇 codes
900 篇 computer science
898 篇 object recogniti...

机构

174 篇 univ sci & techn...
158 篇 univ chinese aca...
153 篇 carnegie mellon ...
145 篇 chinese univ hon...
109 篇 microsoft resear...
103 篇 zhejiang univ pe...
99 篇 swiss fed inst t...
95 篇 tsinghua univers...
90 篇 microsoft res as...
90 篇 tsinghua univ pe...
88 篇 shanghai ai lab ...
81 篇 zhejiang univers...
77 篇 alibaba grp peop...
74 篇 hong kong univ s...
73 篇 university of sc...
72 篇 peking univ peop...
72 篇 university of ch...
68 篇 shanghai jiao to...
66 篇 univ oxford oxfo...
65 篇 google res mount...

作者

80 篇 van gool luc
70 篇 zhang lei
58 篇 timofte radu
48 篇 yang yi
47 篇 luc van gool
46 篇 xiaoou tang
44 篇 tian qi
43 篇 darrell trevor
42 篇 loy chen change
42 篇 sun jian
41 篇 qi tian
40 篇 li stan z.
38 篇 li fei-fei
37 篇 chen xilin
36 篇 shan shiguang
35 篇 zhou jie
35 篇 vasconcelos nuno
35 篇 liu yang
35 篇 torralba antonio
34 篇 liu xiaoming

语言

20,982 篇 英文
10 篇 中文
7 篇 其他
5 篇 土耳其文
2 篇 日文
2 篇 葡萄牙文

检索条件"任意字段=2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016"

共 21008 条记录，以下是321-330 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

GLID: Pre-training a Generalist Encoder-Decoder vision Model

GLID: Pre-training a Generalist Encoder-Decoder Vision Model

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Liu, Jihao Zheng, Jinliang Liu, Yu Li, Hongsheng CUHK MMLab Hong Kong Peoples R China SenseTime Res Hong Kong Peoples R China Shanghai AI Lab Shanghai Peoples R China CPII InnoHK Hong Kong Peoples R China Tsinghua Univ Inst AI Ind Res AIR Shanghai Peoples R China

ISBN: (纸本)9798350353006

This paper proposes a GeneraLIst encoder-Decoder (GLID) pre-training method for better handling various downstream computer vision tasks. While self-supervised pre-training approaches, e.g., Masked Autoencoder, have shown success in transfer learning, task-specific sub-architectures are still required to be appended for different downstream tasks, which cannot enjoy the benefits of large-scale pre-training. GLID overcomes this challenge by allowing the pre-trained generalist encoder-decoder to be fine-tuned on various vision tasks with minimal task-specific architecture modifications. In the GLID training scheme, pre-training pretext task and other downstream tasks are modeled as "query-to-answer" problems, including the pre-training pretext task and other downstream tasks. We pre-train a task-agnostic encoder-decoder with query-mask pairs. During fine-tuning, GLID maintains the pre-trained encoder-decoder and queries, only replacing the topmost linear transformation layer with task-specific linear heads. This minimizes the pretrain-finetune architecture inconsistency and enables the pre-trained model to better adapt to downstream tasks. GLID achieves competitive performance on various vision tasks, including object detection, image segmentation, pose estimation, and depth estimation, outper-forming or matching specialist models such as Mask2Former, DETR, ViTPose, and BinsFormer.

关键词： Image segmentation

来源：评论

学校读者我要写书评

暂无评论

EgoGen: An Egocentric Synthetic Data Generator

EgoGen: An Egocentric Synthetic Data Generator

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Li, Gen Zhao, Kaifeng Zhang, Siwei Lyu, Xiaozhong Dusmanu, Mihai Zhang, Yan Pollefeys, Marc Tang, Siyu Swiss Fed Inst Technol Zurich Switzerland Microsoft Redmond WA USA

ISBN: (纸本)9798350353006

Understanding the world in first-person view is fundamental in Augmented Reality (AR). This immersive perspective brings dramatic visual changes and unique challenges compared to third-person views. Synthetic data has empowered third-person-view vision models, but its application to embodied egocentric perception tasks remains largely unexplored. A critical challenge lies in simulating natural human movements and behaviors that effectively steer the embodied cameras to capture a faithful egocentric representation of the 3D world. To address this challenge, we introduce EgoGen, a new synthetic data generator that can produce accurate and rich ground-truth training data for egocentric perception tasks. At the heart of EgoGen is a novel human motion synthesis model that directly leverages egocentric visual inputs of a virtual human to sense the 3D environment. Combined with collision-avoiding motion primitives and a two-stage reinforcement learning approach, our motion synthesis model offers a closed-loop solution where the embodied perception and movement of the virtual human are seamlessly coupled. Compared to previous works, our model eliminates the need for a predefined global path, and is directly applicable to dynamic environments. Combined with our easy-to-use and scalable data generation pipeline, we demonstrate EgoGen's efficacy in three tasks: mapping and localization for head-mounted cameras, egocentric camera tracking, and human mesh recovery from egocentric views. EgoGen will be fully open-sourced, offering a practical solution for creating realistic egocentric training data and aiming to serve as a useful tool for egocentric computer vision research.

关键词： autonomous virtual humans egocentric vision reinforcement learning synthetic data

来源：评论

学校读者我要写书评

暂无评论

GeneAvatar: Generic Expression-Aware Volumetric Head Avatar Editing from a Single Image

GeneAvatar: Generic Expression-Aware Volumetric Head Avatar ...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Bao, Chong Zhang, Yinda Li, Yuan Zhang, Xiyu Yang, Bangbang Bao, Hujun Pollefeys, Marc Zhang, Guofeng Cui, Zhaopeng Zhejiang Univ State Key Lab CAD & CG Hangzhou Peoples R China Google Mountain View CA 94043 USA Swiss Fed Inst Technol Zurich Switzerland ByteDance Beijing Peoples R China

ISBN: (纸本)9798350353013;9798350353006

Recently, we have witnessed the explosive growth of various volumetric representations in modeling animatable head avatars. However, due to the diversity of frameworks, there is no practical method to support high-level applications like 3D head avatar editing across different representations. In this paper, we propose a generic avatar editing approach that can be universally applied to various 3DMM-driving volumetric head avatars. To achieve this goal, we design a novel expression-aware modification generative model, which enables lift 2D editing from a single image to a consistent 3D modification field. To ensure the effectiveness of the generative modification process, we develop several techniques, including an expression-dependent modification distillation scheme to draw knowledge from the large-scale head avatar model and 2D facial texture editing tools, implicit latent space guidance to enhance model convergence, and a segmentation-based loss reweight strategy for fine-grained texture inversion. Extensive experiments demonstrate that our method delivers high-quality and consistent results across multiple expression and viewpoints. Project page: https://***/geneavatar/.

关键词： 3D computer vision avatar editing head avatar neural rendering

来源：评论

学校读者我要写书评

暂无评论

Lift3D: Zero-Shot Lifting of Any 2D vision Model to 3D

Lift3D: Zero-Shot Lifting of Any 2D Vision Model to 3D

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Varma, Mukund T. Wang, Peihao Fan, Zhiwen Wang, Zhangyang Su, Hao Ramamoorthi, Ravi Univ Calif San Diego La Jolla CA 92093 USA Univ Texas Austin Austin TX USA

ISBN: (纸本)9798350353006

In recent years, there has been an explosion of 2D vision models for numerous tasks such as semantic segmentation, style transfer or scene editing, enabled by large-scale 2D image datasets. At the same time, there has been renewed interest in 3D scene representations such as neural radiance fields from multi-view images. However, the availability of 3D or multiview data is still substantially limited compared to 2D image datasets, making extending 2D vision models to 3D data highly desirable but also very challenging. Indeed, extending a single 2D vision operator like scene editing to 3D typically requires a highly creative method specialized to that task and often requires per-scene optimization. In this paper, we ask the question of whether any 2D vision model can be lifted to make 3D consistent predictions. We answer this question in the affirmative;our new Lift3D method trains to predict unseen views on feature spaces generated by a few visual models (i.e. DINO and CLIP), but then generalizes to novel vision operators and tasks, such as style transfer, super-resolution, open vocabulary segmentation and image colorization;for some of these tasks, there is no comparable previous 3D method. In many cases, we even outperform

关键词： Semantic Segmentation

来源：评论

学校读者我要写书评

暂无评论

SCoFT: Self-Contrastive Fine-Tuning for Equitable Image Generation

SCoFT: Self-Contrastive Fine-Tuning for Equitable Image Gene...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Liu, Zhixuan Schaldenbrand, Peter Okogwu, Beverley-Claire Peng, Wenxuan Yun, Youngsik Hundt, Andrew Kim, Jihie Oh, Jean Carnegie Mellon Univ Pittsburgh PA 15213 USA Nanyang Technol Univ Singapore Singapore Dongguk Univ Seoul South Korea

ISBN: (纸本)9798350353006

Accurate representation in media is known to improve the well-being of the people who consume it. Generative image models trained on large web-crawled datasets such as LAION are known to produce images with harmful stereotypes and misrepresentations of cultures. We improve inclusive representation in generated images by (1) engaging with communities to collect a culturally representative dataset that we call the Cross-Cultural Under-standing Benchmark (CCUB) and (2) proposing a novel Self- Contrastive Fine-Tuning (SCoFT, pronounced /soft/) method that leverages the model's known biases to self-improve. SCoFT is designed to prevent overfitting on small datasets, encode only high-level information from the data, and shift the generated distribution away from misrepresentations encoded in a pretrained model. Our user study conducted on 51 participants from 5 different countries based on their self-selected national cultural affiliation shows that fine-tuning on CCUB consistently generates images with higher cultural relevance and fewer stereotypes when compared to the Stable Diffusion baseline, which is further improved with our SCoFT technique. Resources and code are at https://***/SCoFT.

关键词： computer vision for Social Good Image Synthesis

来源：评论

学校读者我要写书评

暂无评论

RGBD Objects in the Wild: Scaling Real-World 3D Object Learning from RGB-D Videos

RGBD Objects in the Wild: Scaling Real-World 3D Object Learn...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Xia, Hongchi Fu, Yang Liu, Sifei Wang, Xiaolong Shanghai Jiao Tong Univ Shanghai Peoples R China Univ Calif San Diego San Diego CA USA NVIDIA Santa Clara CA USA

ISBN: (纸本)9798350353006

We introduce a new RGB-D object dataset captured in the wild called WildRGB-D. Unlike most existing real-world object-centric datasets which only come with RGB capturing, the direct capture of the depth channel allows better 3D annotations and broader downstream applications. WildRGB-D comprises large-scale category-level RGB-D object videos, which are taken using an iPhone to go around the objects in 360 degrees. It contains around 8500 recorded objects and nearly 20000 RGB-D videos across 46 common object categories. These videos are taken with diverse cluttered backgrounds with three setups to cover as many real-world scenarios as possible: (i) a single object in one video;(ii) multiple objects in one video;and (iii) an object with a static hand in one video. The dataset is annotated with object masks, real-world scale camera poses, and reconstructed aggregated point clouds from RGBD videos. We benchmark four tasks with WildRGB-D including novel view synthesis, camera pose estimation, object 6d pose estimation, and object surface reconstruction. Our experiments show that the large-scale capture of RGB-D objects provides a large potential to advance 3D object learning. Our project page is https://***/.

关键词： computer vision

来源：评论

学校读者我要写书评

暂无评论

SignGraph: A Sign Sequence is Worth Graphs of Nodes

SignGraph: A Sign Sequence is Worth Graphs of Nodes

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Gan, Shiwei Yin, Yafeng Jiang, Zhiwei Wen, Hongkai Xie, Lei Lu, Sanglu Nanjing Univ State Key Lab Novel Software Technol Nanjing Peoples R China Univ Warwick Dept Comp Sci Warwick England

ISBN: (纸本)9798350353006

Despite the recent success of sign language research, the widely adopted CNN-based backbones are mainly migrated from other computer vision tasks, in which the contours and texture of objects are crucial for identifying objects. They usually treat sign frames as grids and may fail to capture effective cross-region features. In fact, sign language tasks need to focus on the correlation of different regions in one frame and the interaction of different regions among adjacent frames for identifying a sign sequence. In this paper, we propose to represent a sign sequence as graphs and introduce a simple yet effective graph-based sign language processing architecture named SignGraph, to extract cross-region features at the graph level. SignGraph consists of two basic modules: Local Sign Graph (LSG) module for learning the correlation of intra-frame cross-region features in one frame and Temporal Sign Graph (TSG) module for tracking the interaction of inter- frame cross-region features among adjacent frames. With LSG and TSG, we build our model in a multiscale manner to ensure that the representation of nodes can capture cross-region features at different granularities. Extensive experiments on current public sign language datasets demonstrate the superiority of our SignGraph model. Our model achieves very competitive performances with the SOTA model, while not using any extra cues. Code and models are available at: https://***/gswycf/SignGraph.

关键词： Graph convolutional network Sign language Sign language recognition

来源：评论

学校读者我要写书评

暂无评论

MMA: Multi-Modal Adapter for vision-Language Models

MMA: Multi-Modal Adapter for Vision-Language Models

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Yang, Lingxiao Zhang, Ru-Yuan Wang, Yanchen Xie, Xiaohua Sun Yat Sen Univ Guangzhou Peoples R China Shanghai Jiao Tong Univ Shanghai Peoples R China Stanford Univ Stanford CA USA

ISBN: (纸本)9798350353006

Pre-trained vision-Language Models (VLMs) have served as excellent foundation models for transfer learning in diverse downstream tasks. However, tuning VLMs for few-shot generalization tasks faces a discrimination - generalization dilemma, i.e., general knowledge should be preserved and task-specific knowledge should be fine-tuned. How to precisely identify these two types of representations remains a challenge. In this paper, we propose a Multi-Modal Adapter (MMA) for VLMs to improve the alignment between representations from text and vision branches. MMA aggregates features from different branches into a shared feature space so that gradients can be communicated across branches. To determine how to incorporate MMA, we systematically analyze the discriminability and generalizability of features across diverse datasets in both the vision and language branches, and find that (1) higher layers contain discriminable dataset-specific knowledge, while lower layers contain more generalizable knowledge, and (2) language features are more discriminable than visual features, and there are large semantic gaps between the features of the two modalities, especially in the lower layers. Therefore, we only incorporate MMA to a few higher layers of transformers to achieve an optimal balance between discrimination and generalization. We evaluate the effectiveness of our approach on three tasks: generalization to novel classes, novel target datasets, and domain generalization. Compared to many state-of-the-art methods, our MMA achieves leading performance in all evaluations. Code is at https://***/ZjjConan/Multi-Modal-Adapter

关键词： Adapter Multi-Modal vision-Language Models

来源：评论

学校读者我要写书评

暂无评论

Pre-trained Model Guided Fine-Tuning for Zero-Shot Adversarial Robustness

Pre-trained Model Guided Fine-Tuning for Zero-Shot Adversari...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Wang, Sibo Zhang, Jie Yuan, Zheng Shan, Shiguang Chinese Acad Sci Inst Comp Technol Beijing Peoples R China Univ Chinese Acad Sci Beijing Peoples R China

ISBN: (纸本)9798350353006

Large-scale pre-trained vision-language models like CLIP have demonstrated impressive performance across various tasks, and exhibit remarkable zero-shot generalization capability, while they are also vulnerable to imperceptible adversarial examples. Existing works typically employ adversarial training (fine-tuning) as a defense method against adversarial examples. However, direct application to the CLIP model may result in overfitting, compromising the model's capacity for generalization. In this paper, we propose Pre-trained Model Guided Adversarial Fine-Tuning (PMG-AFT) method, which leverages supervision from the original pre-trained model by carefully designing an auxiliary branch, to enhance the model's zero-shot adversarial robustness. Specifically, PMG-AFT minimizes the distance between the features of adversarial examples in the target model and those in the pre-trained model, aiming to preserve the generalization features already captured by the pre-trained model. Extensive Experiments on 15 zero-shot datasets demonstrate that PMG-AFT significantly outper-forms the state-of-the-art method, improving the top-1 robust accuracy by an average of 4.99%. Furthermore, our approach consistently improves clean accuracy by an aver-age of 8.72%. Our code is available at here.(1)

关键词： Adversarial Robustness Large-scale vision-language models Zero-Shot

来源：评论

学校读者我要写书评

暂无评论

Leveraging Cross-Modal Neighbor Representation for Improved CLIP Classification

Leveraging Cross-Modal Neighbor Representation for Improved ...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Yi, Chao Ren, Lu Zhan, De-Chuan Ye, Han-Jia Nanjing Univ Natl Key Lab Novel Software Technol Nanjing Peoples R China Nanjing Univ Sch Artificial Intelligence Nanjing Peoples R China

ISBN: (纸本)9798350353006

CLIP showcases exceptional cross-modal matching capabilities due to its training on image-text contrastive learning tasks. However, without specific optimization for unimodal scenarios, its performance in single-modality feature extraction might be suboptimal. Despite this, some studies have directly used CLIP's image encoder for tasks like few-shot classification, introducing a misalignment between its pre-training objectives and feature extraction methods. This inconsistency can diminish the quality of the image's feature representation, adversely affecting CLIP's effectiveness in target tasks. In this paper, we view text features as precise neighbors of image features in CLIP's space and present a novel CrOss-moDal nEighbor Representation (CODER) based on the distance structure between images and their neighbor texts. This feature extraction method aligns better with CLIP's pre-training objectives, thereby fully leveraging CLIP's robust cross-modal capabilities. The key to construct a high-quality CODER lies in how to create a vast amount of high-quality and diverse texts to match with images. We introduce the Auto Text Generator (ATG) to automatically generate the required texts in a data-free and training-free manner. We apply CODER to CLIP's zero-shot and few-shot image classification tasks. Experiment results across various datasets and models confirm CODER's effectiveness. Code is available at: https://***/YCaigogogo/cvpr24-CODER.

关键词： Multimodal Learning vision-Language Model

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 29 30 31 32 33 34 35 36 37 38 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：