检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

22,998 篇 会议
107 册 图书
93 篇 期刊文献

馆藏范围

23,197 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

13,621 篇 工学
- 11,107 篇 计算机科学与技术...
- 3,478 篇 软件工程
- 2,445 篇 机械工程
- 1,715 篇 光学工程
- 1,076 篇 电气工程
- 1,013 篇 控制科学与工程
- 784 篇 信息与通信工程
- 411 篇 仪器科学与技术
- 352 篇 生物工程
- 251 篇 生物医学工程（可授...
- 196 篇 电子科学与技术（可...
- 114 篇 化学工程与技术
- 107 篇 安全科学与工程
- 100 篇 测绘科学与技术
- 88 篇 建筑学
- 85 篇 交通运输工程
- 84 篇 土木工程
3,494 篇 医学
- 3,481 篇 临床医学
- 81 篇 基础医学(可授医学...
3,240 篇 理学
- 1,939 篇 物理学
- 1,639 篇 数学
- 563 篇 统计学（可授理学、...
- 500 篇 生物学
- 249 篇 系统科学
- 106 篇 化学
521 篇 管理学
- 311 篇 图书情报与档案管...
- 223 篇 管理科学与工程(可...
- 76 篇 工商管理
276 篇 艺术学
- 276 篇 设计学（可授艺术学...
66 篇 法学
- 63 篇 社会学
38 篇 农学
28 篇 教育学
22 篇 经济学
10 篇 军事学
3 篇 文学

主题

10,186 篇 computer vision
3,967 篇 pattern recognit...
3,005 篇 training
2,007 篇 computational mo...
1,817 篇 visualization
1,815 篇 cameras
1,515 篇 feature extracti...
1,481 篇 shape
1,455 篇 three-dimensiona...
1,438 篇 image segmentati...
1,287 篇 robustness
1,205 篇 computer archite...
1,155 篇 semantics
1,147 篇 conferences
1,107 篇 layout
1,093 篇 computer science
1,088 篇 object detection
1,025 篇 benchmark testin...
970 篇 codes
922 篇 face recognition

机构

136 篇 univ sci & techn...
121 篇 univ chinese aca...
118 篇 chinese univ hon...
107 篇 carnegie mellon ...
101 篇 tsinghua univers...
101 篇 microsoft resear...
95 篇 swiss fed inst t...
93 篇 zhejiang univ pe...
82 篇 university of sc...
81 篇 zhejiang univers...
80 篇 university of ch...
77 篇 shanghai ai lab ...
72 篇 shanghai jiao to...
69 篇 national laborat...
67 篇 microsoft res as...
67 篇 alibaba grp peop...
64 篇 adobe research
62 篇 tsinghua univ pe...
60 篇 peking univ peop...
59 篇 univ oxford oxfo...

作者

81 篇 van gool luc
72 篇 timofte radu
64 篇 zhang lei
47 篇 luc van gool
40 篇 yang yi
40 篇 li stan z.
37 篇 loy chen change
34 篇 chen chen
33 篇 xiaoou tang
32 篇 liu yang
32 篇 qi tian
31 篇 tian qi
31 篇 sun jian
30 篇 murino vittorio
30 篇 pascal fua
29 篇 darrell trevor
29 篇 li fei-fei
28 篇 li xin
28 篇 ying shan
27 篇 vasconcelos nuno

语言

23,131 篇 英文
38 篇 其他
22 篇 中文
5 篇 土耳其文
2 篇 日文

检索条件"任意字段=IEEE Conference on Computer Vision and Pattern Recognition Workshops"

共 23198 条记录，以下是301-310 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

LQMFormer: Language-aware Query Mask Transformer for Referring Image Segmentation

LQMFormer: Language-aware Query Mask Transformer for Referri...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Shah, Nisarg A. Vibashan, V. S. Patel, Vishal M. Johns Hopkins Univ Baltimore MD 21218 USA

ISBN: (纸本)9798350353006

Referring Image Segmentation (RIS) aims to segment objects from an image based on a language description. Recent advancements have introduced transformer-based methods that leverage cross-modal dependencies, significantly enhancing performance in referring segmentation tasks. These methods are designed such that each query predicts different masks. However, RIS inherently requires a single-mask prediction, leading to a phenomenon known as Query Collapse, where all queries yield the same mask prediction. This reduces the generalization capability of the RIS model for complex or novel scenarios. To address this issue, we propose a Multi-modal Query Feature Fusion technique, characterized by two innovative designs: (1) Gaussian enhanced Multi-Modal Fusion, a novel visual grounding mechanism that enhances overall representation by extracting rich local visual information and global visual-linguistic relationships, and (2) A Dynamic Query Module that produces a diverse set of queries through a scoring network where the network selectively focuses on queries for objects referred to in the language description. Moreover, we show that including an auxiliary loss to increase the distance between mask representations of different queries further enhances performance and mitigates query collapse. Extensive experiments conducted on four benchmark datasets validate the effectiveness of our framework.

关键词： Image segmentation Multimodal Transformer vision-Language

来源：评论

学校读者我要写书评

暂无评论

RMT: Retentive Networks Meet vision Transformers

RMT: Retentive Networks Meet Vision Transformers

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Fan, Qihang Huang, Huaibo Chen, Mingrui Liu, Hongmin He, Ran Chinese Acad Sci Inst Automat MAIS & CRIPAC Beijing Peoples R China Univ Chinese Acad Sci Sch Artificial Intelligence Beijing Peoples R China Univ Sci & Technol Beijing Beijing Peoples R China

ISBN: (纸本)9798350353013;9798350353006

vision Transformer (ViT) has gained increasing attention in the computer vision community in recent years. However, the core component of ViT, Self-Attention, lacks explicit spatial priors and bears a quadratic computational complexity, thereby constraining the applicability of ViT. To alleviate these issues, we draw inspiration from the recent Retentive Network (RetNet) in the field of NLP, and propose RMT, a strong vision backbone with explicit spatial prior for general purposes. Specifically, we extend the RetNet's temporal decay mechanism to the spatial domain, and propose a spatial decay matrix based on the Manhattan distance to introduce the explicit spatial prior to Self-Attention. Additionally, an attention decomposition form that adeptly adapts to explicit spatial prior is proposed, aiming to reduce the computational burden of modeling global information without disrupting the spatial decay matrix. Based on the spatial decay matrix and the attention decomposition form, we can flexibly integrate explicit spatial prior into the vision backbone with linear complexity. Extensive experiments demonstrate that RMT exhibits exceptional performance across various vision tasks. Specifically, without extra training data, RMT achieves 84.8% and 86.1% top-1 acc on ImageNet-1k with 27M/4.5GFLOPs and 96M/18.2GFLOPs. For downstream tasks, RMT achieves 54.5 box AP and 47.2 mask AP on the COCO detection task, and 52.8 mIoU on the ADE20K se-mantic segmentation task.

关键词： vision Transformer

来源：评论

学校读者我要写书评

暂无评论

PELA: Learning Parameter-Efficient Models with Low-Rank Approximation

PELA: Learning Parameter-Efficient Models with Low-Rank Appr...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Guo, Yangyang Wang, Guangzhi Kankanhalli, Mohan Natl Univ Singapore Singapore Singapore

ISBN: (纸本)9798350353006

Applying a pre-trained large model to downstream tasks is prohibitive under resource-constrained conditions. Re-cent dominant approaches for addressing efficiency issues involve adding a few learnable parameters to the fixed backbone model. This strategy, however, leads to more challenges in loading large models for downstream fine-tuning with limited resources. In this paper, we propose a novel method for increasing the parameter efficiency of pre-trained models by introducing an intermediate pre-training stage. To this end, we first employ low-rank approximation to compress the original large model and then devise a feature distillation module and a weight perturbation regularization module. These modules are specifically designed to enhance the low-rank model. In particular, we update only the low-rank model while freezing the backbone parameters during pre-training. This allows for direct and efficient utilization of the low-rank model for downstream fine-tuning tasks. The proposed method achieves both efficiencies in terms of required parameters and computation time while maintaining comparable results with minimal modifications to the backbone architecture. Specifically, when applied to three vision-only and one vision-language Transformer models, our approach often demonstrates a merely similar to 0.6 point decrease in performance while reducing the original parameter size by 1/3 to 2/3. We release our code at link.

关键词： Knowledge Distillation Low-rank Approximation vision-Language

来源：评论

学校读者我要写书评

暂无评论

User-Guided Variable Rate Learned Image Compression

User-Guided Variable Rate Learned Image Compression

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Gupta, Rushil Suryateja, B., V Kapoor, Nikhil Jaiswal, Rajat Nangi, Sharmila Kulkarni, Kuldeep Adobe Res Bengaluru India Indian Inst Technol Delhi Delhi India Stanford Univ Stanford CA 94305 USA

ISBN: (数字)9781665487399

ISBN: (纸本)9781665487399

We propose a learning-based image compression method that achieves any arbitrary input bitrate via user-guided bit allocation to preferred regions. We verify our hypothesis of incorporating user guidance for bitrate control by experimenting with alternatives that do not have any guidance. We conduct extensive evaluation on CelebA-HQ and CityScapes dataset using standard quantitative metrics and human studies showing that our single model for multiple bitrates achieves similar or better performance as compared to previous learned image compression methods that require re-training for each new bitrate.

关键词： Measurement computer vision Image coding conferences Computational modeling Bit rate pattern recognition

来源：评论

学校读者我要写书评

暂无评论

Synthesize, Diagnose, and Optimize: Towards Fine-Grained vision-Language Understanding

Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vis...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Peng, Wujian Xi, Sicheng You, Zuyao Lan, Shiyi Wu, Zuxuan Fudan Univ Sch CS Shanghai Key Lab Intell Info Proc Shanghai Peoples R China Shanghai Collaborat Innovat Ctr Intelligent Visua Shanghai Peoples R China NVIDIA Shenzhen Guangdong Peoples R China

ISBN: (纸本)9798350353006

vision language models (VLM) have demonstrated remarkable performance across various downstream tasks. However, understanding fine-grained visual-linguistic concepts, such as attributes and inter-object relationships, remains a significant challenge. While several benchmarks aim to evaluate VLMs in finer granularity, their primary focus remains on the linguistic aspect, neglecting the visual dimension. Here, we highlight the importance of evaluating VLMs from both a textual and visual perspective. We introduce a progressive pipeline to synthesize images that vary in a specific attribute while ensuring consistency in all other aspects. Utilizing this data engine, we carefully design a benchmark, SPEC, to diagnose the comprehension of object size, position, existence, and count. Subsequently, we conduct a thorough evaluation of four leading VLMs on SPEC. Surprisingly, their performance is close to random guess, revealing significant limitations. With this in mind, we propose a simple yet effective approach to optimize VLMs in fine-grained understanding, achieving significant improvements on SPEC without compromising the zero-shot performance. Results on two additional fine-grained benchmarks also show consistent improvements, further validating the transferability of our approach. Code and data are available at https://***/wjpoom/SPEC.

关键词： Fine-grained understdanding vision language model

来源：评论

学校读者我要写书评

暂无评论

ViT-CoMer: vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions

ViT-CoMer: Vision Transformer with Convolutional Multi-scale...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Xia, Chunlong Wang, Xinliang Lv, Feng Hao, Xin Shi, Yifeng Baidu Inc Beijing Peoples R China

ISBN: (纸本)9798350353013;9798350353006

Although vision Transformer (ViT) has achieved significant success in computer vision, it does not perform well in dense prediction tasks due to the lack of inner-patch information interaction and the limited diversity of feature scale. Most existing studies are devoted to designing vision-specific transformers to solve the above problems, which introduce additional pre-training costs. Therefore, we present a plain, pre-training-free, and feature-enhanced ViT back-bone with Convolutional Multi-scale feature interaction, named ViT-CoMer, which facilitates bidirectional interaction between CNN and transformer. Compared to the state-of-the-art, ViT-CoMer has the following advantages: (1) We inject spatial pyramid multi-receptive field convolutional features into the ViT architecture, which effectively alleviates the problems of limited local information interaction and single-feature representation in ViT. (2) We propose a simple and efficient CNN-Transformer bidirectional fusion interaction module that performs multi-scale fusion across hierarchical features, which is beneficial for handling dense prediction tasks. (3) We evaluate the performance of ViT-CoMer across various dense prediction tasks, different frameworks, and multiple advanced pre-training. Notably, our ViT-CoMer-L achieves 64.3% AP on COCO val2017 without extra training data, and 62.1% mIoU on ADE20K val, both of which are comparable to state-of-the-art methods. We hope ViT-CoMer can serve as a new backbone for dense prediction tasks to facilitate future research. The code will be released at https://***/Traffic-X/ViT-CoMer.

关键词： DensePrediction visionFoundationBackbone visionTransformer

来源：评论

学校读者我要写书评

暂无评论

Robustness and Adaptation to Hidden Factors of Variation

Robustness and Adaptation to Hidden Factors of Variation

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Paul, William Burlina, Philippe Johns Hopkins Univ Appl Phys Lab Laurel MD 20723 USA

ISBN: (数字)9781665487399

ISBN: (纸本)9781665487399

We tackle here a specific, still not widely addressed aspect, of AI robustness, which consists of seeking invariance / insensitivity of model performance to hidden factors of variations in the data. Towards this end, we employ a two step strategy that a) does unsupervised discovery, via generative models, of sensitive factors that cause models to under-perform, and b) intervenes models to make their performance invariant to these sensitive factors' influence. We consider 3 separate interventions for robustness, including: data augmentation, semantic consistency, and adversarial alignment. We evaluate our method using metrics that measure trade offs between invariance (insensitivity) and overall performance (utility) and show the benefits of our method for 3 settings (unsupervised, semi-supervised and generalization).

关键词： Measurement computer vision conferences Semantics Robustness Data models pattern recognition

来源：评论

学校读者我要写书评

暂无评论

SMM-Conv: Scalar Matrix Multiplication with Zero Packing for Accelerated Convolution

SMM-Conv: Scalar Matrix Multiplication with Zero Packing for...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Ofir, Amir Ben-Artzi, Gil Ariel Univ Ariel Israel

ISBN: (数字)9781665487399

ISBN: (纸本)9781665487399

We present a novel approach for accelerating convolutions during inference for CPU-based architectures. The most common method of computation involves packing the image into the columns of a matrix (im2col) and performing general matrix multiplication (GEMM) with a matrix of weights. This results in two main drawbacks: (a) im2col requires a large memory buffer and can experience inefficient memory access, and (b) while GEMM is highly optimized for scientific matrices multiplications, it is not well suited for convolutions. We propose an approach that takes advantage of scalar-matrix multiplication and reduces memory overhead. Our experiments with commonly used network architectures demonstrate a significant speedup compared to existing indirect methods.

关键词： Deep learning computer vision conferences Memory management Network architecture pattern recognition Kernel

来源：评论

学校读者我要写书评

暂无评论

On the test-time zero-shot generalization of vision-language models: Do we really need prompt learning?

On the test-time zero-shot generalization of vision-language...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Zanella, Maxime Ben Ayed, Ismail UCLouvain Louvain Belgium UMons Mons Belgium ETS Montreal Montreal PQ Canada

ISBN: (纸本)9798350353006

The development of large vision-language models, notably CLIP, has catalyzed research into effective adaptation techniques, with a particular focus on soft prompt tuning. Conjointly, test-time augmentation, which utilizes multiple augmented views of a single image to enhance zero-shot generalization, is emerging as a significant area of interest. This has predominantly directed research efforts toward test-time prompt tuning. In contrast, we introduce a robust MeanShift for Test-time Augmentation (MTA), which surpasses prompt-based methods without requiring this intensive training procedure. This positions MTA as an ideal solution for both standalone and API-based applications. Additionally, our method does not rely on ad hoc rules (e.g., confidence threshold) used in some previous test-time augmentation techniques to filter the augmented views. Instead, MTA incorporates a quality assessment variable for each view directly into its optimization process, termed as the inlierness score. This score is jointly optimized with a density mode seeking process, leading to an efficient training- and hyperparameter-free approach. We extensively benchmark our method on 15 datasets and demonstrate MTA's superiority and computational efficiency. Deployed easily as plug-and-play module on top of zero-shot models and state-of-the-art few-shot methods, MTA shows systematic and consistent improvements.

关键词： CLIP test-time augmentation training-free vision-language zero-shot

来源：评论

学校读者我要写书评

暂无评论

Compositional Chain-of-Thought Prompting for Large Multimodal Models

Compositional Chain-of-Thought Prompting for Large Multimoda...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Mitra, Chancharik Huang, Brandon Darrell, Trevor Herzig, Roei Univ Calif Berkeley Berkeley CA 94720 USA

ISBN: (纸本)9798350353006

The combination of strong visual backbones and Large Language Model (LLM) reasoning has led to Large Multimodal Models (LMMs) becoming the current standard for a wide range of vision and language (VL) tasks. However, recent research has shown that even the most advanced LMMs still struggle to capture aspects of compositional visual reasoning, such as attributes and relationships between objects. One solution is to utilize scene graphs (SGs)-a formalization of objects and their relations and attributes that has been extensively used as a bridge between the visual and textual domains. Yet, scene graph data requires scene graph annotations, which are expensive to collect and thus not easily scalable. Moreover, finetuning an LMM based on SG data can lead to catastrophic forgetting of the pretraining objective. To overcome this, inspired by chain-of-thought methods, we propose Compositional Chain-of-Thought (CCoT), a novel zero-shot Chain-of-Thought prompting method that utilizes SG representations in order to extract compositional knowledge from an LMM. Specifically, we first generate an SG using the LMM, and then use that SG in the prompt to produce a response. Through extensive experiments, we find that the proposed CCoT approach not only improves LMM performance on several vision and language (VL) compositional benchmarks but also improves the performance of several popular LMMs on general multimodal benchmarks, without the need for fine-tuning or annotated ground-truth SGs. Code: https://***/chancharikmitra/CCoT.

关键词： Compositionality Large Multimodal Models Multimodality Prompting Scene Graphs vision & Language

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 27 28 29 30 31 32 33 34 35 36 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：