检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

23,136 篇 会议
90 篇 期刊文献
15 册 图书

馆藏范围

23,240 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

13,631 篇 工学
- 11,162 篇 计算机科学与技术...
- 3,338 篇 软件工程
- 2,414 篇 机械工程
- 1,663 篇 光学工程
- 1,203 篇 电气工程
- 973 篇 控制科学与工程
- 738 篇 信息与通信工程
- 381 篇 仪器科学与技术
- 322 篇 生物工程
- 239 篇 生物医学工程（可授...
- 188 篇 电子科学与技术（可...
- 109 篇 化学工程与技术
- 104 篇 安全科学与工程
- 99 篇 测绘科学与技术
- 85 篇 建筑学
- 83 篇 交通运输工程
- 82 篇 土木工程
- 56 篇 力学（可授工学、理...
3,696 篇 医学
- 3,684 篇 临床医学
- 76 篇 基础医学(可授医学...
3,138 篇 理学
- 1,880 篇 物理学
- 1,605 篇 数学
- 547 篇 统计学（可授理学、...
- 466 篇 生物学
- 243 篇 系统科学
- 107 篇 化学
491 篇 管理学
- 290 篇 图书情报与档案管...
- 212 篇 管理科学与工程(可...
- 74 篇 工商管理
252 篇 艺术学
- 251 篇 设计学（可授艺术学...
58 篇 法学
38 篇 农学
25 篇 教育学
19 篇 经济学
10 篇 军事学
3 篇 文学

主题

10,395 篇 computer vision
3,892 篇 pattern recognit...
3,101 篇 training
2,104 篇 computational mo...
1,898 篇 visualization
1,799 篇 cameras
1,487 篇 feature extracti...
1,475 篇 three-dimensiona...
1,464 篇 shape
1,447 篇 image segmentati...
1,287 篇 robustness
1,234 篇 computer archite...
1,213 篇 semantics
1,112 篇 benchmark testin...
1,111 篇 conferences
1,104 篇 layout
1,092 篇 object detection
1,084 篇 computer science
1,026 篇 codes
907 篇 face recognition

机构

137 篇 univ sci & techn...
124 篇 univ chinese aca...
121 篇 chinese univ hon...
108 篇 tsinghua univers...
108 篇 carnegie mellon ...
105 篇 microsoft resear...
97 篇 zhejiang univ pe...
91 篇 swiss fed inst t...
85 篇 university of sc...
84 篇 zhejiang univers...
81 篇 shanghai ai lab ...
79 篇 university of ch...
75 篇 shanghai jiao to...
69 篇 microsoft res as...
68 篇 alibaba grp peop...
66 篇 adobe research
65 篇 national laborat...
64 篇 peking univ peop...
61 篇 univ oxford oxfo...
59 篇 peng cheng labor...

作者

80 篇 van gool luc
71 篇 timofte radu
65 篇 zhang lei
43 篇 luc van gool
40 篇 yang yi
37 篇 loy chen change
34 篇 li stan z.
33 篇 liu yang
33 篇 xiaoou tang
33 篇 murino vittorio
33 篇 chen chen
33 篇 qi tian
33 篇 li fei-fei
32 篇 tian qi
32 篇 sun jian
30 篇 ying shan
30 篇 pascal fua
29 篇 darrell trevor
28 篇 li xin
28 篇 hanqing lu

语言

23,148 篇 英文
66 篇 其他
20 篇 中文
5 篇 土耳其文
2 篇 日文

检索条件"任意字段=IEEE/CVF Conference on Computer Vision and Pattern Recognition"

共 23241 条记录，以下是221-230 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

Cross-view and Cross-pose Completion for 3D Human Understanding

Cross-view and Cross-pose Completion for 3D Human Understand...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Armando, Matthieu Galaaoui, Salma Baradel, Fabien Lucas, Thomas Leroy, Vincent Bregier, Romain Weinzaepfel, Philippe Rogez, Gregory NAVER LABS Europe Meylan France

ISBN: (纸本)9798350353013;9798350353006

Human perception and understanding is a major domain of computer vision which, like many other vision subdomains recently, stands to gain from the use of large models pre-trained on large datasets. We hypothesize that the most common pre-training strategy of relying on general purpose, object-centric image datasets such as ImageNet, is limited by an important domain shift. On the other hand, collecting domain-specific ground truth such as 2D or 3D labels does not scale well. Therefore, we propose a pre-training approach based on self-supervised learning that works on human-centric data using only images. Our method uses pairs of images of humans: the first is partially masked and the model is trained to reconstruct the masked parts given the visible ones and a second image. It relies on both stereoscopic (cross-view) pairs, and temporal (cross-pose) pairs taken from videos, in order to learn priors about 3D as well as human motion. We pre-train a model for body-centric tasks and one for hand-centric tasks. With a generic transformer architecture, these models outperform existing self-supervised pre-training methods on a wide set of human-centric downstream tasks, and obtain state-of-the-art performance for instance when fine-tuning for model-based and model-free human mesh recovery.

关键词： hand-centric human-centric pretraining representation learning vision tranformer

来源：评论

学校读者我要写书评

暂无评论

Not All Prompts Are Secure: A Switchable Backdoor Attack Against Pre-trained vision Transfomers

Not All Prompts Are Secure: A Switchable Backdoor Attack Aga...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Yang, Sheng Bai, Jiawang Gao, Kuofeng Yang, Yong Li, Yiming Xia, Shu-Tao Tsinghua Univ Beijing Peoples R China Tencent Secur Platform Dept Shenzhen Peoples R China Zhejiang Univ Hangzhou Peoples R China Peng Cheng Lab Res Ctr Artificial Intelligence Shenzhen Peoples R China

ISBN: (纸本)9798350353006

Given the power of vision transformers, a new learning paradigm, pretraining and then prompting, makes it more efficient and effective to address downstream visual recognition tasks. In this paper, we identify a novel security threat towards such a paradigm from the perspective of backdoor attacks. Specifically, an extra prompt token, called the switch token in this work, can turn the backdoor mode on, i.e., converting a benign model into a backdoored one. Once under the backdoor mode, a specific trigger can force the model to predict a target class. It poses a severe risk to the users of cloud API, since the malicious behavior can not be activated and detected under the benign mode, thus making the attack very stealthy. To attack a pretrained model, our proposed attack, named SWARM, learns a trigger and prompt tokens including a switch token. They are optimized with the clean loss which encourages the model always behaves normally even the trigger presents, and the backdoor loss that ensures the backdoor can be activated by the trigger when the switch is on. Besides, we utilize the crossmode feature distillation to reduce the effect of the switch token on clean samples. The experiments on diverse visual recognition tasks confirm the success of our switchable backdoor attack, i.e., achieving 95%+ attack success rate, and also being hard to be detected and removed. Our code is available at https://***/20000yshust/SWARM.

关键词： Backdoor computer vision Parameter-Efficient Fine Tuning Switchable vision Transformers Visual Prompting

来源：评论

学校读者我要写书评

暂无评论

CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor

CLIP as RNN: Segment Countless Visual Concepts without Train...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Sun, Shuyang Li, Runjia Torr, Philip Gu, Xiuye Li, Siyang Univ Oxford Oxford England Google Res Mountain View CA 94043 USA

ISBN: (纸本)9798350353006

Existing open-vocabulary image segmentation methods require a fine-tuning step on mask labels and/or image-text datasets. Mask labels are labor-intensive, which limits the number of categories in segmentation datasets. Consequently, the vocabulary capacity of pre-trained VLMs is severely reduced after fine-tuning. However, without fine-tuning, VLMs trained under weak image-text supervision tend to make suboptimal mask predictions. To alleviate these issues, we introduce a novel recurrent framework that progressively filters out irrelevant texts and enhances mask quality without training efforts. The recurrent unit is a two-stage segmenter built upon a frozen VLM. Thus, our model retains the VLM's broad vocabulary space and equips it with segmentation ability. Experiments show that our method outperforms not only the training-free counterparts, but also those fine-tuned with millions of data samples, and sets the new state-of-the-art records for both zero-shot semantic and referring segmentation. Concretely, we improve the current record by 28.8, 16.0, and 6.9 mIoU on Pascal VOC, COCO Object, and Pascal Context.

关键词： image segmentation open-vocabulary referring segmentation training-free methods vision-language models

来源：评论

学校读者我要写书评

暂无评论

Knowledge Distillation for Efficient Instance Semantic Segmentation with Transformers

Knowledge Distillation for Efficient Instance Semantic Segme...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Li, Maohui Halstead, Michael McCool, Chris Univ Bonn Bonn Germany Lamarr Inst Machine Learning & Artificial Intelli Dortmund Germany

ISBN: (纸本)9798350365474

Instance-based semantic segmentation provides detailed per-pixel scene understanding information crucial for both computer vision and robotics applications. However, state-of-the-art approaches such as Mask2Former are computationally expensive and reducing this computational burden while maintaining high accuracy remains challenging. Knowledge distillation has been regarded as a potential way to compress neural networks, but to date limited work has explored how to apply this to distill information from the output queries of a model such as Mask2Former. In this paper, we match the output queries of the student and teacher models to enable a query-based knowledge distillation scheme. We independently match the teacher and the student to the groundtruth and use this to define the teacher to student relationship for knowledge distillation. Using this approach we show that it is possible to perform knowledge distillation where the student models can have a lower number of queries and the backbone can be changed from a Transformer architecture to a convolutional neural network architecture. Experiments on two challenging agricultural datasets, sweet pepper (BUP20) and sugar beet (SB20), and Cityscapes demonstrate the efficacy of our approach. Across the three datasets the student models obtain an average absolute performance improvement in AP of 1.8 and 1.9 points for ResNet-50 and Swin-Tiny backbone respectively. To the best of our knowledge, this is the first work to propose knowledge distillation schemes for instance semantic segmentation with transformer-based models.

关键词： computer vision for Agriculture Automation Knowledge Distillation Efficient Instance Segmentation Transformercomputer vision for Agriculture Automation Knowledge Distillation Efficient Instance Segmentation Transformer

来源：评论

学校读者我要写书评

暂无评论

HomoFormer: Homogenized Transformer for Image Shadow Removal

HomoFormer: Homogenized Transformer for Image Shadow Removal

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Xiao, Jie Fu, Xueyang Zhu, Yurui Li, Dong Huang, Jie Zhu, Kai Zha, Zheng-Jun Univ Sci & Technol China Hefei Peoples R China Alibaba Grp Hangzhou Peoples R China

ISBN: (纸本)9798350353006

The spatial non-uniformity and diverse patterns of shadow degradation conflict with the weight sharing manner of dominant models, which may lead to an unsatisfactory compromise. To tackle with this issue, we present a novel strategy from the view of shadow transformation in this paper: directly homogenizing the spatial distribution of shadow degradation. Our key design is the random shuffle operation and its corresponding inverse operation. Specifically, random shuffle operation stochastically rear-ranges the pixels across spatial space and the inverse operation recovers the original order. After randomly shuffling, the shadow diffuses in the whole image and the degradation appears in a homogenized way, which can be effectively processed by the local self-attention layer. Moreover, we further devise a new feed forward network with position modeling to exploit image structural information. Based on these elements, we construct the final local window based transformer named HomoFormer for image shadow removal. Our HomoFormer can enjoy the linear complexity of local transformers while bypassing challenges of non-uniformity and diversity of shadow. Extensive experiments are conducted to verify the superiority of our HomoFormer across public datasets. Code is available at https://***/jiexiaou/HomoFormer.

关键词： Image Restoration Image Shadow Removal vision Transformer

来源：评论

学校读者我要写书评

暂无评论

Pseudo-label based unsupervised fine-tuning of a monocular 3D pose estimation model for sports motions

Pseudo-label based unsupervised fine-tuning of a monocular 3...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Suzuki, Tomohiro Tanaka, Ryota Takeda, Kazuya Fujii, Keisuke Nagoya Univ Nagoya Aichi Japan

ISBN: (纸本)9798350365474

Accurate motion capture is useful for sports motion analysis, but requires higher acquisition costs. Monocular or few camera multi-view pose estimation provides an accessible but less accurate alternative, especially for sports motion, due to training on datasets of daily activities. In addition, multi-view estimation is still costly due to camera calibration. Therefore, it is desirable to develop an accurate and cost-effective motion capture system for the daily training in sports. In this paper, we propose an accurate and convenient sports motion capture system based on unsupervised fine-tuning. The proposed system estimates 3D joint positions by multi-view estimation based on automatic calibration with the human body. These results are used as pseudo-labels for fine-tuning of the recent higher performance monocular 3D pose estimation model. Since the fine-tuning improves the model accuracy for sports motion, we can choose multi-view or monocular estimation depending on the situation. We evaluated the system using a running motion dataset and ASPset-510, and showed that fine-tuning improved the performance of monocular estimation to the same level as that of multi-view estimation for running motion. Our proposed system can be useful for the daily motion analysis in sports.

关键词： computer vision Pose estimation Running Sports

来源：评论

学校读者我要写书评

暂无评论

Exploring Regional Clues in CLIP for Zero-Shot Semantic Segmentation

Exploring Regional Clues in CLIP for Zero-Shot Semantic Segm...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Zhang, Yi Guo, Meng-Hao Wang, Miao Hu, Shi-Min Beihang Univ State Key Lab Virtual Real Technol & Syst SCSE Beijing Peoples R China Tsinghua Univ Dept Comp Sci & Technol BNRist Beijing Peoples R China Tsinghua Univ Beijing Peoples R China

ISBN: (纸本)9798350353013;9798350353006

CLIP has demonstrated marked progress in visual recognition due to its powerful pre-training on large-scale image-text pairs. However, it still remains a critical challenge: how to transfer image-level knowledge into pixel-level understanding tasks such as semantic segmentation. In this paper, to solve the mentioned challenge, we analyze the gap between the capability of the CLIP model and the requirement of the zero-shot semantic segmentation task. Based on our analysis and observations, we propose a novel method for zero-shot semantic segmentation, dubbed CLIP-RC (CLIP with Regional Clues), bringing two main insights. On the one hand, a region-level bridge is necessary to provide fine-grained semantics. On the other hand, overfitting should be mitigated during the training stage. Benefiting from the above discoveries, CLIP-RC achieves state-of-the-art performance on various zero-shot semantic segmentation benchmarks, including PASCAL VOC, PASCAL Context, and COCO-Stuff 164K. Code will be available at https://***/Jittor/JSeg.

关键词： computer vision semantic segmentation zero-shot semantic segmentation

来源：评论

学校读者我要写书评

暂无评论

PELA: Learning Parameter-Efficient Models with Low-Rank Approximation

PELA: Learning Parameter-Efficient Models with Low-Rank Appr...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Guo, Yangyang Wang, Guangzhi Kankanhalli, Mohan Natl Univ Singapore Singapore Singapore

ISBN: (纸本)9798350353006

Applying a pre-trained large model to downstream tasks is prohibitive under resource-constrained conditions. Re-cent dominant approaches for addressing efficiency issues involve adding a few learnable parameters to the fixed backbone model. This strategy, however, leads to more challenges in loading large models for downstream fine-tuning with limited resources. In this paper, we propose a novel method for increasing the parameter efficiency of pre-trained models by introducing an intermediate pre-training stage. To this end, we first employ low-rank approximation to compress the original large model and then devise a feature distillation module and a weight perturbation regularization module. These modules are specifically designed to enhance the low-rank model. In particular, we update only the low-rank model while freezing the backbone parameters during pre-training. This allows for direct and efficient utilization of the low-rank model for downstream fine-tuning tasks. The proposed method achieves both efficiencies in terms of required parameters and computation time while maintaining comparable results with minimal modifications to the backbone architecture. Specifically, when applied to three vision-only and one vision-language Transformer models, our approach often demonstrates a merely similar to 0.6 point decrease in performance while reducing the original parameter size by 1/3 to 2/3. We release our code at link.

关键词： Knowledge Distillation Low-rank Approximation vision-Language

来源：评论

学校读者我要写书评

暂无评论

Unlocking the Potential of Pre-trained vision Transformers for Few-Shot Semantic Segmentation through Relationship Descriptors

Unlocking the Potential of Pre-trained Vision Transformers f...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Zhou, Ziqin Xu, Hai-Ming Shu, Yangyang Liu, Lingqiao Univ Adelaide Adelaide SA Australia

ISBN: (纸本)9798350353013;9798350353006

The recent advent of pre-trained vision transformers has unveiled a promising property: their inherent capability to group semantically related visual concepts. In this paper, we explore to harnesses this emergent feature to tackle few-shot semantic segmentation, a task focused on classifying pixels in a test image with a few example data. A critical hurdle in this endeavor is preventing overfitting to the limited classes seen during training the few-shot segmentation model. As our main discovery, we find that the concept of "relationship descriptors", initially conceived for enhancing the CLIP model for zero-shot semantic segmentation, offers a potential solution. We adapt and refine this concept to craft a relationship descriptor construction tailored for few-shot semantic segmentation, extending its application across multiple layers to enhance performance. Building upon this adaptation, we proposed a few-shot semantic segmentation framework that is not only easy to implement and train but also effectively scales with the number of support examples and categories. Through rigorous experimentation across various datasets, including PASCAL-5(i) and COCO-20(i), we demonstrate a clear advantage of our method in diverse few-shot semantic segmentation scenarios, and a range of pre-trained vision transformer models. The findings clearly show that our method significantly outperforms current state-of-the-art techniques, highlighting the effectiveness of harnessing the emerging capabilities of vision transformers for few-shot semantic segmentation. We release the code at https://***/ZiqinZhou66/***.

关键词： Semantic Segmentation

来源：评论

学校读者我要写书评

暂无评论

Grounded Question-Answering in Long Egocentric Videos

Grounded Question-Answering in Long Egocentric Videos

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Di, Shangzhe Xie, Weidi Shanghai Jiao Tong Univ CMIC Shanghai Peoples R China Shanghai AI Lab Shanghai Peoples R China

ISBN: (纸本)9798350353006

Existing approaches to video understanding, mainly designed for short videos from a third-person perspective, are limited in their applicability in certain fields, such as robotics. In this paper, we delve into open-ended question-answering (QA) in long, egocentric videos, which allows individuals or robots to inquire about their own past visual experiences. This task presents unique challenges, including where did I put lettuce? Choices: (A) pantry (B) refrigerator (C) cupboard (D) draw 20-50s Answer: in the fridge / (B) refrigerator the complexity of temporally grounding queries within extensive video content, the high resource demands for precise data annotation, and the inherent difficulty of evaluating open-ended answers due to their ambiguous nature. Our proposed approach tackles these challenges by (i) integrating query grounding and answering within a unified model to reduce error propagation;(ii) employing large language models for efficient and scalable data synthesis;and (iii) introducing a close-ended QA task for evaluation, to manage answer ambiguity. Extensive experiments demonstrate the effectiveness of our method, which also achieves state-of-the-art performance on the QAEgo4D and Ego4D-NLQ benchmarks. Code, data, and models are open-sourced (1).

关键词： egocentric vision video grounding video question answering

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 19 20 21 22 23 24 25 26 27 28 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：