检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

11,267 篇 会议
14 篇 期刊文献

馆藏范围

11,281 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

7,859 篇 工学
- 7,418 篇 计算机科学与技术...
- 799 篇 机械工程
- 390 篇 电气工程
- 377 篇 软件工程
- 224 篇 控制科学与工程
- 68 篇 光学工程
- 32 篇 信息与通信工程
- 26 篇 生物工程
- 10 篇 生物医学工程（可授...
- 8 篇 化学工程与技术
- 7 篇 电子科学与技术（可...
- 6 篇 交通运输工程
- 5 篇 安全科学与工程
- 3 篇 仪器科学与技术
- 2 篇 力学（可授工学、理...
- 2 篇 材料科学与工程（可...
- 2 篇 动力工程及工程热...
- 2 篇 航空宇航科学与技...
3,103 篇 医学
- 3,102 篇 临床医学
- 4 篇 基础医学(可授医学...
297 篇 理学
- 199 篇 系统科学
- 69 篇 物理学
- 27 篇 生物学
- 24 篇 数学
- 9 篇 统计学（可授理学、...
- 7 篇 化学
23 篇 管理学
- 14 篇 图书情报与档案管...
- 9 篇 管理科学与工程(可...
- 4 篇 工商管理
6 篇 法学
- 6 篇 社会学
2 篇 农学
1 篇 教育学
1 篇 艺术学

主题

5,461 篇 computer vision
2,564 篇 training
2,118 篇 pattern recognit...
1,632 篇 computational mo...
1,454 篇 visualization
1,325 篇 three-dimensiona...
1,070 篇 semantics
972 篇 codes
968 篇 benchmark testin...
930 篇 computer archite...
885 篇 deep learning
831 篇 task analysis
729 篇 feature extracti...
541 篇 conferences
530 篇 neural networks
526 篇 face recognition
503 篇 transformers
480 篇 object detection
478 篇 image segmentati...
469 篇 cameras

机构

169 篇 univ sci & techn...
146 篇 tsinghua univ pe...
142 篇 univ chinese aca...
142 篇 carnegie mellon ...
132 篇 chinese univ hon...
122 篇 peng cheng lab p...
102 篇 zhejiang univ pe...
96 篇 sensetime res pe...
95 篇 swiss fed inst t...
90 篇 shanghai ai lab ...
86 篇 tsinghua univers...
86 篇 stanford univ st...
84 篇 shanghai jiao to...
80 篇 zhejiang univers...
79 篇 alibaba grp peop...
79 篇 univ hong kong p...
76 篇 peng cheng labor...
76 篇 tech univ munich...
74 篇 australian natl ...
73 篇 peking univ peop...

作者

67 篇 timofte radu
60 篇 van gool luc
50 篇 zhang lei
43 篇 yang yi
36 篇 loy chen change
36 篇 tao dacheng
31 篇 liu yang
30 篇 zhou jie
30 篇 chen chen
30 篇 tian qi
29 篇 sun jian
28 篇 zha zheng-jun
27 篇 qi tian
27 篇 boxin shi
26 篇 li xin
26 篇 vasconcelos nuno
26 篇 pollefeys marc
24 篇 liu xiaoming
24 篇 zheng wei-shi
24 篇 luo ping

语言

11,274 篇 英文
6 篇 其他
1 篇 中文

检索条件"任意字段=2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020"

共 11281 条记录，以下是131-140 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

RegionGPT: Towards Region Understanding vision Language Model

RegionGPT: Towards Region Understanding Vision Language Mode...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Guo, Qiushan De Mello, Shalini Yin, Hongxu Byeon, Wonmin Cheung, Ka Chun Yu, Yizhou Luo, Ping Liu, Sifei Univ Hong Kong Hong Kong Peoples R China NVIDIA San Francisco CA USA

ISBN: (纸本)9798350353006

vision language models (VLMs) have experienced rapid advancements through the integration of large language models (LLMs) with image-text pairs, yet they struggle with detailed regional visual understanding due to limited spatial awareness of the vision encoder, and the use of coarse-grained training data that lacks detailed, region-specific captions. To address this, we introduce RegionGPT (short as RGPT), a novel framework designed for complex region-level captioning and understanding. RGPT enhances the spatial awareness of regional representation with simple yet effective modifications to existing visual encoders in VLMs. We further improve performance on tasks requiring a specific output scope by integrating task-guided instruction prompts during both training and inference phases, while maintaining the model's versatility for general-purpose tasks. Additionally, we develop an automated region caption data generation pipeline, enriching the training set with detailed region-level captions. We demonstrate that a universal RGPT model can be effectively applied and significantly enhancing performance across a range of region-level tasks, including but not limited to complex region descriptions, reasoning, object classification, and referring expressions comprehension. Code will be released at the project page.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Action Scene Graphs for Long-Form Understanding of Egocentric Videos

Action Scene Graphs for Long-Form Understanding of Egocentri...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Rodin, Ivan Furnari, Antonino Min, Kyle Tripathi, Subarna Farinella, Giovanni Maria Univ Catania Catania Italy Intel Labs Hillsboro OR USA

ISBN: (纸本)9798350353006

We present Egocentric Action Scene Graphs (EASGs), a new representation for long-form understanding of egocentric videos. EASGs extend standard manually-annotated representations of egocentric videos, such as verb-noun action labels, by providing a temporally evolving graph-based description of the actions performed by the camera wearer, including interacted objects, their relationships, and how actions unfold in time. Through a novel annotation procedure, we extend the Ego4D dataset adding manually labeled Egocentric Action Scene Graphs which offer a rich set of annotations for long-from egocentric video understanding. We hence define the EASG generation task and provide a baseline approach, establishing preliminary benchmarks. Experiments on two downstream tasks, action anticipation and activity summarization, highlight the effectiveness of EASGs for long-form egocentric video understanding. We will release the dataset and code to replicate experiments and annotations 1 1 The code is available at https://***/fpv-iplab/EASG.

关键词： egocentric vision long-form video understanding scene graphs

来源：评论

学校读者我要写书评

暂无评论

Efficient Test-Time Adaptation of vision-Language Models

Efficient Test-Time Adaptation of Vision-Language Models

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Karmanov, Adilbek Guan, Dayan Lu, Shijian El Saddik, Abdulmotaleb Xing, Eric Mohamed bin Zayed Univ Artificial Intelligence Abu Dhabi U Arab Emirates Nanyang Technol Univ Singapore Singapore Univ Ottawa Ottawa ON Canada Carnegie Mellon Univ Pittsburgh PA 15213 USA

ISBN: (纸本)9798350353006

Test-time adaptation with pre-trained vision-language models has attracted increasing attention for tackling distribution shifts during the test time. Though prior studies have achieved very promising performance, they involve intensive computation which is severely unaligned with test-time adaptation. We design TDA, a training-free dynamic adapter that enables effective and efficient test-time adaptation with vision-language models. TDA works with a lightweight key-value cache that maintains a dynamic queue with few-shot pseudo labels as values and the corresponding test-sample features as keys. Leveraging the key-value cache, TDA allows adapting to test data gradually via progressive pseudo label refinement which is super-efficient without incurring any backpropagation. In addition, we introduce negative pseudo labeling that alleviates the adverse impact of pseudo label noises by assigning pseudo labels to certain negative classes when the model is uncertain about its pseudo label predictions. Extensive experiments over two benchmarks demonstrate TDA's superior effectiveness and efficiency as compared with the state-of- the-art. The code has been released in https://***/tda/.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Multi-criteria Token Fusion with One-step-ahead Attention for Efficient vision Transformers

Multi-criteria Token Fusion with One-step-ahead Attention fo...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Lee, Sanghyeok Choi, Joonmyung Kim, Hyunwoo J. Korea Univ Dept Comp Sci & Engn Seoul South Korea

ISBN: (纸本)9798350353006

vision Transformer (ViT) has emerged as a prominent backbone for computer vision. For more efficient ViTs, recent works lessen the quadratic cost of the self- attention layer by pruning or fusing the redundant tokens. However, these works faced the speed-accuracy trade-off caused by the loss of information. Here, we argue that token fusion needs to consider diverse relations between tokens to minimize information loss. In this paper, we propose a Multi-criteria Token Fusion (MCTF), that gradually fuses the tokens based on multi-criteria (i.e., similarity, informativeness, and size of fused tokens). Further, we utilize the one-step-ahead attention, which is the improved approach to capture the informativeness of the tokens. By training the model equipped with MCTF using a token reduction consistency, we achieve the best speed-accuracy tradeoff in the image classification (ImageNet1K). Experimental results prove that MCTF consistently surpasses the previous reduction methods with and without training. Specifically, DeiT-T and DeiT-S with MCTF reduce FLOPs by about 44% while improving the performance (+0.5%, and +0.3%) over the base model, respectively. We also demonstrate the applicability of MCTF in various vision Transformers (e.g., T2T-ViT, LV-ViT), achieving at least 31% speedup without performance degradation. Code is available at https://***/mlvlab/MCTF.

关键词： Efficient ViTs Token Fusion Token Merging Token Reduction

来源：评论

学校读者我要写书评

暂无评论

Towards 3D vision with Low-Cost Single-Photon Cameras

Towards 3D Vision with Low-Cost Single-Photon Cameras

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Mu, Fangzhou Sifferman, Carter Jungerman, Sacha Li, Yiquan Han, Mark Gleicher, Michael Gupta, Mohit Li, Yin Univ Wisconsin Madison WI 53706 USA

ISBN: (纸本)9798350353013;9798350353006

We present a method for reconstructing 3D shape of arbitrary Lambertian objects based on measurements by miniature, energy-efficient, low-cost single-photon cameras. These cameras, operating as time resolved image sensors, illuminate the scene with a very fast pulse of diffuse light and record the shape of that pulse as it returns back from the scene at a high temporal resolution. We propose to model this image formation process, account for its non-idealities, and adapt neural rendering to reconstruct 3D geometry from a set of spatially distributed sensors with known poses. We show that our approach can successfully recover complex 3D shapes from simulated data. We further demonstrate 3D object reconstruction from real-world captures, utilizing measurements from a commodity proximity sensor. Our work draws a connection between image-based modeling and active range scanning, and offers a step towards 3D vision with single-photon cameras. Our project webpage is at https://***/ towards_3d_vision/.

关键词：

来源：评论

学校读者我要写书评

暂无评论

SkipPLUS: Skip the First Few Layers to Better Explain vision Transformers

SkipPLUS: Skip the First Few Layers to Better Explain Vision...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Mehri, Faridoun Fayyaz, Mohsen Baghshah, Mahdieh Soleymani Pilehvar, Mohammad Taher Sharif Univ Technol Tehran Iran Univ Tehran Tehran Iran Cardiff Univ Cardiff Wales

ISBN: (纸本)9798350365474

Despite their remarkable performance, the explainability of vision Transformers (ViTs) remains a challenge. While forward attention-based token attribution techniques have become popular in text processing, their suitability for ViTs hasn't been extensively explored. In this paper, we compare these methods against state-of-the-art input attribution methods from the vision literature, revealing their limitations due to improper aggregation of information across layers. To address this, we introduce two general techniques, PLUS and SkipPLUS, that can be composed with any input attribution method to more effectively aggregate information across layers while handling noisy layers. Through comprehensive and quantitative evaluations of faithfulness and human interpretability on a variety of ViT architectures and datasets, we demonstrate the effectiveness of PLUS and SkipPLUS, establishing a new state-of-the-art in white-box token attribution. We conclude with a comparative analysis highlighting the strengths and weaknesses of the best versions of all the studied methods. The code used in this paper is freely available at https://***/NightMachinery/SkipPLUS-cvpr-2024.

关键词： Explainable AI Forward Attention-Based Token Attribution Interpretability Neural Network Visualization vision Transformers White-Box Input Attribution Methods xAI

来源：评论

学校读者我要写书评

暂无评论

Consistency and Uncertainty: Identifying Unreliable Responses From Black-Box vision-Language Models for Selective Visual Question Answering

Consistency and Uncertainty: Identifying Unreliable Response...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Khan, Zaid Fu, Yun Northeastern Univ Boston MA 02115 USA

ISBN: (纸本)9798350353006

The goal of selective prediction is to allow an a model to abstain when it may not be able to deliver a reliable prediction, which is important in safety-critical contexts. Existing approaches to selective prediction typically require access to the internals of a model, require retraining a model or study only unimodal models. However, the most powerful models (e.g. GPT-4) are typically only available as black boxes with inaccessible internals, are not retrainable by end-users, and are frequently used for multimodal tasks. We study the possibility of selective prediction for vision-language models in a realistic, black-box setting. We propose using the principle of neighborhood consistency to identify unreliable responses from a black-box vision-language model in question answering tasks. We hypothesize that given only a visual question and model response, the consistency of the model's responses over the neighborhood of a visual question will indicate reliability. It is impossible to directly sample neighbors in feature space in a black-box setting. Instead, we show that it is possible to use a smaller proxy model to approximately sample from the neighborhood. We find that neighborhood consistency can be used to identify model responses to visual questions that are likely unreliable, even in adversarial settings or settings that are out-of-distribution to the proxy model.

关键词： predictive uncertainty selective prediction trustworthy ml vision-language visual question answering

来源：评论

学校读者我要写书评

暂无评论

Sequential Modeling Enables Scalable Learning for Large vision Models

Sequential Modeling Enables Scalable Learning for Large Visi...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Bail, Yutong Geng, Xinyang Mangalam, Karttikeya Bar, Amir Yuille, Alan L. Darrell, Trevor Malik, Jitendra Efros, Alexei A. UC Berkeley BAIR Berkeley CA 94720 USA Johns Hopkins Univ Baltimore MD 21218 USA

ISBN: (纸本)9798350353006

We introduce a novel sequential modeling approach which enables learning a Large vision Model (LVM) without making use of any linguistic data. To do this, we define a common format, "visual sentences", in which we can represent raw images and videos as well as annotated data sources such as semantic segmentations and depth reconstructions with-out needing any meta-knowledge beyond the pixels. Once this wide variety of visual data (comprising 420 billion to-kens) is represented as sequences, the model can be trained to minimize a cross-entropy loss for next token prediction. By training across various scales of model architecture and data diversity, we provide empirical evidence that our models scale effectively. Many different vision tasks can be solved by designing suitable visual prompts at test time.

关键词： pretraining scaling Self-supervised Learning

来源：评论

学校读者我要写书评

暂无评论

GRAM: Global Reasoning for Multi-Page VQA

GRAM: Global Reasoning for Multi-Page VQA

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Blau, Tsachi Fogel, Sharon Ronen, Roi Goltst, Alona Per, Shahar Tsi Ben Avraham, Elad Aberdam, Aviad Ganz, Roy Litman, Ron Technion Haifa Israel AWS AI Labs Shanghai Peoples R China

ISBN: (纸本)9798350353006

The increasing use of transformer-based large language models brings forward the challenge of processing long sequences. In document visual question answering (DocVQA), leading methods focus on the single-page setting, while documents can span hundreds of pages. We present GRAM, a method that seamlessly extends pre-trained single-page models to the multi-page setting, without requiring computationally-heavy pretraining. To do so, we leverage a single-page encoder for local page-level understanding, and enhance it with document-level designated layers and learnable tokens, facilitating the flow of information across pages for global reasoning. To enforce our model to utilize the newly introduced document tokens, we propose a tailored bias adaptation method. For additional computational savings during decoding, we introduce an optional compression stage using our compression-transformer(C-Former), reducing the encoded sequence length, thereby allowing a tradeoff between quality and latency. Extensive experiments showcase GRAM's state-of-the-art performance on the benchmarks for multi-page DocVQA, demonstrating the effectiveness of our approach.

关键词： Document Understanding Long Sequence Processing vision Language Models

来源：评论

学校读者我要写书评

暂无评论

Emu Edit: Precise Image Editing via recognition and Generation Tasks

Emu Edit: Precise Image Editing via Recognition and Generati...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Sheynin, Shelly Polyak, Adam Singer, Uriel Kirstain, Yuval Zohar, Amit Ashual, Oron Parikh, Devi Taigman, Yaniv Meta GenAI Menlo Pk CA 94025 USA

ISBN: (纸本)9798350353013;9798350353006

Instruction-based image editing holds immense potential for a variety of applications, as it enables users to perform any editing operation using a natural language instruction. However, current models in this domain often struggle with accurately executing user instructions. We present Emu Edit, a multi-task image editing model which sets state-of-the-art results in instruction-based image editing. To develop Emu Edit we train it to multi-task across an unprecedented range of tasks, such as region-based editing, free-form editing, and computer vision tasks, all of which are formulated as generative tasks. Additionally, to enhance Emu Edit's multi-task learning abilities, we provide it with learned task embeddings which guide the generation process towards the correct edit type. Both these elements are essential for Emu Edit's outstanding performance. Furthermore, we show that Emu Edit can generalize to new tasks, such as image inpainting, super-resolution, and compositions of editing tasks, with just a few labeled examples. This capability offers a significant advantage in scenarios where high-quality samples are scarce. Lastly, to facilitate a more rigorous and informed assessment of instructable image editing models, we release a new challenging and versatile benchmark that includes seven different image editing tasks. (1)

关键词：

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 10 11 12 13 14 15 16 17 18 19 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：