检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

11,883 篇 会议
5 篇 期刊文献

馆藏范围

11,888 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

8,055 篇 工学
- 7,613 篇 计算机科学与技术...
- 796 篇 机械工程
- 688 篇 电气工程
- 356 篇 软件工程
- 225 篇 控制科学与工程
- 40 篇 光学工程
- 19 篇 生物工程
- 17 篇 信息与通信工程
- 12 篇 生物医学工程（可授...
- 6 篇 电子科学与技术（可...
- 6 篇 建筑学
- 6 篇 交通运输工程
- 5 篇 仪器科学与技术
- 5 篇 化学工程与技术
- 5 篇 安全科学与工程
- 4 篇 土木工程
3,344 篇 医学
- 3,343 篇 临床医学
- 4 篇 基础医学(可授医学...
- 4 篇 公共卫生与预防医...
250 篇 理学
- 198 篇 系统科学
- 29 篇 物理学
- 21 篇 生物学
- 15 篇 数学
- 9 篇 统计学（可授理学、...
- 4 篇 化学
17 篇 管理学
- 12 篇 管理科学与工程(可...
- 7 篇 图书情报与档案管...
- 5 篇 工商管理
3 篇 法学
- 3 篇 社会学
3 篇 教育学
- 3 篇 教育学
2 篇 农学
1 篇 经济学
1 篇 军事学

主题

5,632 篇 computer vision
2,668 篇 training
2,203 篇 pattern recognit...
1,746 篇 computational mo...
1,502 篇 visualization
1,360 篇 three-dimensiona...
1,074 篇 semantics
999 篇 benchmark testin...
986 篇 codes
959 篇 computer archite...
891 篇 deep learning
777 篇 conferences
754 篇 task analysis
699 篇 feature extracti...
561 篇 transformers
533 篇 face recognition
527 篇 neural networks
495 篇 object detection
490 篇 image segmentati...
468 篇 cameras

机构

174 篇 univ sci & techn...
145 篇 carnegie mellon ...
144 篇 univ chinese aca...
144 篇 tsinghua univ pe...
134 篇 chinese univ hon...
110 篇 zhejiang univ pe...
109 篇 peng cheng lab p...
99 篇 swiss fed inst t...
91 篇 tsinghua univers...
90 篇 shanghai ai lab ...
87 篇 sensetime res pe...
86 篇 shanghai jiao to...
83 篇 zhejiang univers...
82 篇 tech univ munich...
79 篇 university of sc...
79 篇 stanford univ st...
78 篇 univ hong kong p...
77 篇 australian natl ...
76 篇 alibaba grp peop...
75 篇 peng cheng labor...

作者

75 篇 timofte radu
64 篇 van gool luc
50 篇 zhang lei
43 篇 yang yi
37 篇 loy chen change
36 篇 tao dacheng
32 篇 zhou jie
31 篇 chen chen
30 篇 liu yang
30 篇 tian qi
29 篇 sun jian
29 篇 zha zheng-jun
28 篇 li xin
27 篇 qi tian
26 篇 vasconcelos nuno
25 篇 liu xiaoming
25 篇 darrell trevor
24 篇 zheng wei-shi
24 篇 luo ping
24 篇 ying shan

语言

11,862 篇 英文
25 篇 其他
1 篇 中文

检索条件"任意字段=2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024"

共 11888 条记录，以下是71-80 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Discriminative pattern Calibration Mechanism for Source-Free Domain Adaptation

Discriminative Pattern Calibration Mechanism for Source-Free...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Xia, Haifeng Xia, Siyu Ding, Zhengming Southeast Univ Sch Automat Dhaka Bangladesh Tulane Univ Dept Comp Sci New Orleans LA 70118 USA

ISBN: (纸本)9798350353006

Source-free domain adaptation (SFDA) assumes that model adaptation only accesses the well-learned source model and unlabeled target instances for knowledge transfer. However, cross-domain distribution shift easily triggers invalid discriminative semantics from source model on recognizing the target samples. Hence, understanding the specific content of discriminative pattern and adjusting their representation in target domain become the important key to overcome SFDA. To achieve such a vision, this paper proposes a novel explanation paradigm "Discriminative pattern Calibration (DPC)" mechanism on solving SFDA issue. Concretely, DPC first utilizes learning network to infer the discriminative regions on the target images and specifically emphasizes them in feature space to enhance their representation. Moreover, DPC relies on the attention-reversed mixup mechanism to augment more samples and improve the robustness of the classifier. Considerable experimental results and studies suggest that the effectiveness of our DPC in enhancing the performance of existing SFDA baselines.

关键词：

来源：评论

学校读者我要写书评

暂无评论

ALGM: Adaptive Local-then-Global Token Merging for Efficient Semantic Segmentation with Plain vision Transformers

ALGM: Adaptive Local-then-Global Token Merging for Efficient...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Norouzi, Narges Orlova, Svetlana de Geus, Daan Dubbelman, Gijs Eindhoven Univ Technol Eindhoven Netherlands

ISBN: (纸本)9798350353006

This work presents Adaptive Local-then-Global Merging (ALGM), a token reduction method for semantic segmentation networks that use plain vision Transformers. ALGM merges tokens in two stages: (1) In the first network layer, it merges similar tokens within a small local window and (2) halfway through the network, it merges similar tokens across the entire image. This is motivated by an analysis in which we found that, in those situations, tokens with a high cosine similarity can likely be merged without a drop in segmentation quality. With extensive experiments across multiple datasets and network configurations, we show that ALGM not only significantly improves the throughput by up to 100%, but can also enhance the mean IoU by up to +1.1, thereby achieving a better trade-off between segmentation quality and efficiency than existing methods. Moreover, our approach is adaptive during inference, meaning that the same model can be used for optimal efficiency or accuracy, depending on the application. Code is available at https://***/ALGM.

关键词： Efficient vision Transformers Semantic Segmentation Token Merging

来源：评论

学校读者我要写书评

暂无评论

Enhancing vision-Language Pre-training with Rich Supervisions

Enhancing Vision-Language Pre-training with Rich Supervision...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Gao, Yuan Shi, Kunyu Zhu, Pengkai Belval, Edouard Nuriel, Oren Appalaraju, Srikar Ghadar, Shabnam Tu, Zhuowen Mahadevan, Vijay Soatto, Stefano Stanford Univ Stanford CA 94305 USA AWS AI Labs Seattle WA USA Amazon Seattle WA 98109 USA

ISBN: (纸本)9798350353006

We propose Strongly Supervised pre-training with ScreenShots (S4) - a novel pre-training paradigm for vision-Language Models using data from large-scale web screenshot rendering. Using web screenshots unlocks a treasure trove of visual and textual cues that are not present in using image-text pairs. In S4, we leverage the inherent tree-structured hierarchy of HTML elements and the spatial localization to carefully design 10 pre-training tasks with large scale annotated data. These tasks resemble down-stream tasks across different domains and the annotations are cheap to obtain. We demonstrate that, compared to current screenshot pre-training objectives, our innovative pre-training method significantly enhances performance of image-to-text model in nine varied and popular downstream tasks - up to 76.1% improvements on Table Detection, and at least 1% on Widget Captioning.

关键词： pre-training UI understanding vision language models

来源：评论

学校读者我要写书评

暂无评论

Semantics-aware Motion Retargeting with vision-Language Models

Semantics-aware Motion Retargeting with Vision-Language Mode...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Zhang, Haodong Chen, Zhike Xu, Haocheng Hao, Lei Wu, Xiaofei Xu, Songcen Zhang, Zhensong Wang, Yue Xiong, Rong Zhejiang Univ Hangzhou Peoples R China Huawei Noahs Ark Lab Montreal PQ Canada

ISBN: (纸本)9798350353013;9798350353006

Capturing and preserving motion semantics is essential to motion retargeting between animation characters. However, most of the previous works neglect the semantic information or rely on human-designed joint-level representations. Here, we present a novel Semantics-aware Motion reTargeting (SMT) method with the advantage of vision-language models to extract and maintain meaningful motion semantics. We utilize a differentiable module to render 3D motions. Then the high-level motion semantics are incorporated into the motion retargeting process by feeding the vision-language model with the rendered images and aligning the extracted semantic embeddings. To ensure the preservation of fine-grained motion details and high-level semantics, we adopt a two-stage pipeline consisting of skeleton-aware pre-training and fine-tuning with semantics and geometry constraints. Experimental results show the effectiveness of the proposed method in producing high-quality motion retargeting results while accurately preserving motion semantics. Project page can be found at https://***/view/smtnet.

关键词： Animation Motion Retargeting vision-Language Model

来源：评论

学校读者我要写书评

暂无评论

Task-aligned Part-aware Panoptic Segmentation through Joint Object-Part Representations

Task-aligned Part-aware Panoptic Segmentation through Joint ...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： de Geus, Daan Dubbelman, Gijs Eindhoven Univ Technol Eindhoven Netherlands

ISBN: (纸本)9798350353013;9798350353006

Part-aware panoptic segmentation (PPS) requires (a) that each foreground object and background region in an image is segmented and classified, and (b) that all parts within foreground objects are segmented, classified and linked to their parent object. Existing methods approach PPS by separately conducting object-level and part-level segmentation. However, their part-level predictions are not linked to individual parent objects. Therefore, their learning objective is not aligned with the PPS task objective, which harms the PPS performance. To solve this, and make more accurate PPS predictions, we propose Task-Aligned Part-aware Panoptic Segmentation (TAPPS). This method uses a set of shared queries to jointly predict (a) object-level segments, and (b) the part-level segments within those same objects. As a result, TAPPS learns to predict part-level segments that are linked to individual parent objects, aligning the learning objective with the task objective, and allowing TAPPS to leverage joint object-part representations. With experiments, we show that TAPPS considerably outperforms methods that predict objects and parts separately, and achieves new state-of-the-art PPS results.

关键词： computer vision image segmentation panoptic segmentation part segmentation part-aware panoptic segmentation scene understanding semantic segmentation

来源：评论

学校读者我要写书评

暂无评论

Transcending the Limit of Local Window: Advanced Super-Resolution Transformer with Adaptive Token Dictionary

Transcending the Limit of Local Window: Advanced Super-Resol...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Zhang, Leheng Li, Yawei Zhou, Xingyu Zhao, Xiaorui Gu, Shuhang Univ Elect Sci & Technol China Chengdu Peoples R China Swiss Fed Inst Technol Comp Vis Lab Zurich Switzerland Swiss Fed Inst Technol Integrated Syst Lab Zurich Switzerland

ISBN: (纸本)9798350353013;9798350353006

Single Image Super-Resolution is a classic computer vision problem that involves estimating high-resolution (HR) images from low-resolution (LR) ones. Although deep neural networks (DNNs), especially Transformers for super-resolution, have seen significant advancements in recent years, challenges still remain, particularly in limited receptive field caused by window-based self-attention. To address these issues, we introduce a group of auxiliary Adaptive Token Dictionary to SR Transformer and establish an ATD-SR method. The introduced token dictionary could learn prior information from training data and adapt the learned prior to specific testing image through an adaptive refinement step. The refinement strategy could not only provide global information to all input tokens but also group image tokens into categories. Based on category partitions, we further propose a category-based self-attention mechanism designed to leverage distant but similar tokens for enhancing input features. The experimental results show that our method achieves the best performance on various single image super-resolution benchmarks.

关键词： dictionary learning image super-resolution vision transformer

来源：评论

学校读者我要写书评

暂无评论

ArGue: Attribute-Guided Prompt Tuning for vision-Language Models

ArGue: Attribute-Guided Prompt Tuning for Vision-Language Mo...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Tian, Xinyu Zou, Shu Yang, Zhaoyuan Zhang, Jing Australian Natl Univ Canberra ACT Australia GE Res Niskayuna NY USA

ISBN: (纸本)9798350353006

Although soft prompt tuning is effective in efficiently adapting vision-Language (V&L) models for downstream tasks, it shows limitations in dealing with distribution shifts. We address this issue with Attribute-Guided Prompt Tuning (ArGue), making three key contributions. 1) In contrast to the conventional approach of directly appending soft prompts preceding class names, we align the model with primitive visual attributes generated by Large Language Models (LLMs). We posit that a model's ability to express high confidence in these attributes signifies its capacity to discern the correct class rationales. 2) We introduce attribute sampling to eliminate disadvantageous attributes, thus only semantically meaningful attributes are preserved. 3) We propose negative prompting, explicitly enumerating class-agnostic attributes to activate spurious correlations and encourage the model to generate highly orthogonal probability distributions in relation to these negative features. In experiments, our method significantly outperforms current state-of-the-art prompt tuning methods on both novel class prediction and out-of-distribution generalization tasks. The code is available https://***/Liam-Tian/ArGue.

关键词： few-shot adaptation prompt tuning vision-language model

来源：评论

学校读者我要写书评

暂无评论

LLaFS: When Large Language Models Meet Few-Shot Segmentation

LLaFS: When Large Language Models Meet Few-Shot Segmentation

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Zhu, Lanyun Chen, Tianrun Ji, Deyi Ye, Jieping Liu, Jun Singapore Univ Technol & Design Singapore Singapore Zhejiang Univ Hangzhou Peoples R China Alibaba Grp Hangzhou Peoples R China

ISBN: (纸本)9798350353013;9798350353006

This paper proposes LLaFS, the first attempt to leverage large language models (LLMs) in few-shot segmentation. In contrast to the conventional few-shot segmentation methods that only rely on the limited and biased information from the annotated support images, LLaFS leverages the vast prior knowledge gained by LLM as an effective supplement and directly uses the LLM to segment images in a few-shot manner. To enable the text-based LLM to handle image-related tasks, we carefully design an input instruction that allows the LLM to produce segmentation results represented as polygons, and propose a region-attribute table to simulate the human visual mechanism and provide multi-modal guidance. We also synthesize pseudo samples and use curriculum learning for pretraining to augment data and achieve better optimization. LLaFS achieves state-of-the-art results on multiple datasets, showing the potential of using LLMs for few-shot computer vision tasks.

关键词： Few-shot segmentation Large vision-language models

来源：评论

学校读者我要写书评

暂无评论

3DInAction: Understanding Human Actions in 3D Point Clouds

3DInAction: Understanding Human Actions in 3D Point Clouds

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Ben-Shabat, Yizhak Shrout, Oren Gould, Stephen Australian Natl Univ Canberra ACT Australia Technion Israel Inst Technol Haifa Israel

ISBN: (纸本)9798350353006

We propose a novel method for 3D point cloud action recognition. Understanding human actions in RGB videos has been widely studied in recent years, however, its 3D point cloud counterpart remains under-explored despite the clear value that 3D information may bring. This is mostly due to the inherent limitation of the point cloud data modality-lack of structure, permutation invariance, and varying number of points-which makes it difficult to learn a spatio-temporal representation. To address this limitation, we propose the 3DinAction pipeline that first estimates patches moving in time (t-patches) as a key building block, alongside a hierarchical architecture that learns an informative spatio-temporal representation. We show that our method achieves improved performance on existing datasets, including DFAUST and IKEA ASM. Code is publicly available at https://***/sitzikbs/3dincaction.

关键词： 3D action recognition point clouds spatio-temporal representation temporal patches

来源：评论

学校读者我要写书评

暂无评论

You Only Need Less Attention at Each Stage in vision Transformers

You Only Need Less Attention at Each Stage in Vision Transfo...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Zhang, Shuoxi Liu, Hanpeng Lin, Stephen He, Kun Huazhong Univ Sci & Technol Wuhan Peoples R China Microsoft Res Asia Beijing Peoples R China

ISBN: (纸本)9798350353013;9798350353006

The advent of vision Transformers (ViTs) marks a substantial paradigm shift in the realm of computer vision. ViTs capture the global information of images through self-attention modules, which perform dot product computations among patchified image tokens. While self- attention modules empower ViTs to capture long-range dependencies, the computational complexity grows quadratically with the number of tokens, which is a major hindrance to the practical application of ViTs. Moreover, the self-attention mechanism in deep ViTs is also susceptible to the attention saturation issue. Accordingly, we argue against the necessity of computing the attention scores in every layer, and we propose the Less-Attention vision Transformer (LaViT), which computes only a few attention operations at each stage and calculates the subsequent feature alignments in other layers via attention transformations that leverage the previously calculated attention scores. This novel approach can mitigate two primary issues plaguing traditional self-attention modules: the heavy computational burden and attention saturation. Our proposed architecture offers superior efficiency and ease of implementation, merely requiring matrix multiplications that are highly optimized in contemporary deep learning frameworks. Moreover, our architecture demonstrates exceptional performance across various vision tasks including classification, detection and segmentation.

关键词： computer vision efficient training vision transformer

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 4 5 6 7 8 9 10 11 12 13 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：