检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

22,771 篇 会议
112 篇 期刊文献
23 册 图书

馆藏范围

22,905 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

13,398 篇 工学
- 10,880 篇 计算机科学与技术...
- 3,450 篇 软件工程
- 2,430 篇 机械工程
- 1,721 篇 光学工程
- 1,010 篇 控制科学与工程
- 998 篇 电气工程
- 761 篇 信息与通信工程
- 393 篇 仪器科学与技术
- 337 篇 生物工程
- 257 篇 生物医学工程（可授...
- 215 篇 电子科学与技术（可...
- 113 篇 化学工程与技术
- 112 篇 安全科学与工程
- 98 篇 测绘科学与技术
- 92 篇 交通运输工程
- 86 篇 建筑学
- 82 篇 土木工程
3,362 篇 医学
- 3,348 篇 临床医学
- 79 篇 基础医学(可授医学...
3,250 篇 理学
- 1,953 篇 物理学
- 1,664 篇 数学
- 567 篇 统计学（可授理学、...
- 484 篇 生物学
- 245 篇 系统科学
- 109 篇 化学
506 篇 管理学
- 299 篇 图书情报与档案管...
- 219 篇 管理科学与工程(可...
- 75 篇 工商管理
252 篇 艺术学
- 252 篇 设计学（可授艺术学...
62 篇 法学
- 59 篇 社会学
40 篇 农学
25 篇 教育学
19 篇 经济学
11 篇 军事学
3 篇 文学

主题

10,126 篇 computer vision
4,025 篇 pattern recognit...
2,900 篇 training
1,958 篇 computational mo...
1,792 篇 cameras
1,758 篇 visualization
1,485 篇 shape
1,466 篇 image segmentati...
1,447 篇 feature extracti...
1,412 篇 three-dimensiona...
1,288 篇 robustness
1,169 篇 computer archite...
1,144 篇 layout
1,142 篇 computer science
1,134 篇 semantics
1,071 篇 object detection
1,043 篇 conferences
1,009 篇 benchmark testin...
967 篇 codes
810 篇 face recognition

机构

135 篇 univ sci & techn...
118 篇 univ chinese aca...
118 篇 chinese univ hon...
110 篇 carnegie mellon ...
99 篇 tsinghua univers...
99 篇 microsoft resear...
94 篇 swiss fed inst t...
92 篇 zhejiang univ pe...
82 篇 university of sc...
81 篇 zhejiang univers...
77 篇 shanghai ai lab ...
77 篇 university of ch...
72 篇 shanghai jiao to...
68 篇 microsoft res as...
65 篇 national laborat...
65 篇 alibaba grp peop...
64 篇 tsinghua univ pe...
63 篇 adobe research
60 篇 peking univ peop...
59 篇 peng cheng labor...

作者

78 篇 van gool luc
72 篇 timofte radu
63 篇 zhang lei
45 篇 luc van gool
40 篇 yang yi
37 篇 loy chen change
33 篇 xiaoou tang
33 篇 li stan z.
33 篇 qi tian
32 篇 sun jian
31 篇 liu yang
31 篇 li fei-fei
30 篇 chen chen
30 篇 tian qi
30 篇 pascal fua
29 篇 darrell trevor
28 篇 ying shan
27 篇 li xin
27 篇 vasconcelos nuno
27 篇 hanqing lu

语言

22,844 篇 英文
35 篇 其他
20 篇 中文
5 篇 土耳其文
2 篇 日文

检索条件"任意字段=1994 IEEE Computer-Society Conference on Computer Vision and Pattern Recognition"

共 22906 条记录，以下是251-260 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Convolutional Prompting meets Language Models for Continual Learning

Convolutional Prompting meets Language Models for Continual ...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Roy, Anurag Moulick, Riddhiman Verma, Vinay K. Ghosh, Saptarshi Das, Abir IIT Kharagpur Kharagpur W Bengal India IML Amazon India Hyderabad India

ISBN: (纸本)9798350353006

Continual Learning (CL) enables machine learning models to learn from continuously shifting new training data in absence of data from old tasks. Recently, pretrained vision transformers combined with prompt tuning have shown promise for overcoming catastrophic forgetting in CL. These approaches rely on a pool of learnable prompts which can be inefficient in sharing knowledge across tasks leading to inferior performance. In addition, the lack of fine-grained layer specific prompts does not allow these to fully express the strength of the prompts for CL. We address these limitations by proposing ConvPrompt, a novel convolutional prompt creation mechanism that maintains layer-wise shared embeddings, enabling both layer-specific learning and better concept transfer across tasks. The intelligent use of convolution enables us to maintain a low parameter overhead without compromising performance. We further leverage Large Language Models to generate fine-grained text descriptions of each category which are used to get task similarity and dynamically decide the number of prompts to be learned. Extensive experiments demonstrate the superiority of ConvPrompt and improves SOTA by 3% with significantly less parameter overhead. We also perform strong ablation over various modules to disentangle the importance of different components.(1)

关键词： Continual Learning Language Models Prompt Tuning vision Transformer

来源：评论

学校读者我要写书评

暂无评论

Finding Lottery Tickets in vision Models via Data-driven Spectral Foresight Pruning

Finding Lottery Tickets in Vision Models via Data-driven Spe...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Iurada, Leonardo Ciccone, Marco Tommasi, Tatiana Politecn Torino Turin Italy

ISBN: (纸本)9798350353006

Recent advances in neural network pruning have shown how it is possible to reduce the computational costs and memory demands of deep learning models before training. We focus on this framework and propose a new pruning at initialization algorithm that leverages the Neural Tangent Kernel (NTK) theory to align the training dynamics of the sparse network with that of the dense one. Specifically, we show how the usually neglected data-dependent component in the NTK's spectrum can be taken into account by providing an analytical upper bound to the NTK's trace obtained by decomposing neural networks into individual paths. This leads to our Path eXclusion (PX), a foresight pruning method designed to preserve the parameters that mostly influence the NTK's trace. PX is able to find lottery tickets (i.e. good paths) even at high sparsity levels and largely reduces the need for additional training. When applied to pre-trained models it extracts subnetworks directly usable for several downstream tasks, resulting in performance comparable to those of the dense counterpart but with substantial cost and computational savings. Code available at: https://***/iurada/px-ntk-pruning

关键词： Efficient computer vision Neural Network Pruning Neural Tangent Kernel Pruning-at-Initialization

来源：评论

学校读者我要写书评

暂无评论

Situational Awareness Matters in 3D vision Language Reasoning

Situational Awareness Matters in 3D Vision Language Reasonin...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Man, Yunze Gui, Liang-Yan Wang, Yu-Xiong Univ Illinois Urbana IL 61801 USA

ISBN: (纸本)9798350353006

Being able to carry out complicated vision language reasoning tasks in 3D space represents a significant milestone in developing household robots and human-centered embodied AI. In this work, we demonstrate that a critical and distinct challenge in 3D vision language reasoning is the situational awareness, which incorporates two key components: (1) The autonomous agent grounds its self-location based on a language prompt. (2) The agent answers open-ended questions from the perspective of its calculated position. To address this challenge, we introduce SIG3D, an end-to-end Situation-Grounded model for 3D vision language reasoning. We tokenize the 3D scene into sparse voxel representation, and propose a language-grounded situation estimator, followed by a situated question answering module. Experiments on the SQA3D and ScanQA datasets show that SIG3D outperforms state-of-the-art models in situational estimation and question answering by a large margin (e.g., an enhancement of over 30% on situation accuracy). Subsequent analysis corroborates our architectural design choices, explores the distinct functions of visual and textual tokens, and highlights the importance of situational awareness in the domain of 3D question-answering. Project page is available at https://***/situation3d.

关键词： vision-Language Multi-modal 3D Reasoning

来源：评论

学校读者我要写书评

暂无评论

AM-RADIO: Agglomerative vision Foundation Model Reduce All Domains Into One

AM-RADIO: Agglomerative Vision Foundation Model Reduce All D...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Ranzinger, Mike Heinrich, Greg Kautz, Jan Molchanov, Pavlo NVIDIA Santa Clara CA 95051 USA

ISBN: (纸本)9798350353006

A handful of visual foundation models (VFMs) have recently emerged as the backbones for numerous downstream tasks. VFMs like CLIP, DINOv2, SAM are trained with distinct objectives, exhibiting unique characteristics for various downstream tasks. We find that despite their conceptual differences, these models can be effectively merged into a unified model through multi-teacher distillation. We name this approach AM-RADIO (Agglomerative Model - Reduce All Domains Into One). This integrative approach not only surpasses the performance of individual teacher models but also amalgamates their distinctive features, such as zero-shot vision-language comprehension, detailed pixel-level understanding, and open vocabulary segmentation capabilities. Additionally, in pursuit of the most hardware-efficient backbone, we evaluated numerous architectures in our multi-teacher distillation pipeline using the same training recipe. This led to the development of a novel architecture (E-RADIO) that exceeds the performance of its predecessors and is at least 6x faster than the teacher models at matched resolution. Our comprehensive benchmarking process covers downstream tasks including ImageNet classification, semantic segmentation linear probing, COCO object detection and integration into LLaVa-1.5. Code: https://***/NVlabs/RADIO.

关键词： Distillation Knowledge Distillation Semantic Segmentation Visual Foundation Model Visual Large Language Models Zero Shot Classification

来源：评论

学校读者我要写书评

暂无评论

Improved Visual Grounding through Self-Consistent Explanations

Improved Visual Grounding through Self-Consistent Explanatio...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： He, Ruozhen Cascante-Bonilla, Paola Yang, Ziyan Berg, Alexander C. Ordonez, Vicente Rice Univ Houston TX 77005 USA Univ Calif Irvine Irvine CA USA

ISBN: (纸本)9798350353006

vision-and-language models trained to match images with text can be combined with visual explanation methods to point to the locations of specific objects in an image. Our work shows that the localization -"grounding"-abilities of these models can be further improved by fine-tuning for self-consistent visual explanations. We propose a strategy for augmenting existing text-image datasets with paraphrases using a large language model, and SelfEQ, a weakly-supervised strategy on visual explanation maps for paraphrases that encourages self-consistency. Specifically, for an input textual phrase, we attempt to generate a paraphrase and finetune the model so that the phrase and paraphrase map to the same region in the image. We posit that this both expands the vocabulary that the model is able to handle, and improves the quality of the object locations highlighted by gradient-based visual explanation methods (e.g. GradCAM). We demonstrate that SelfEQ improves performance on Flickr30k, ReferIt, and RefCOCO+ over a strong baseline method and several prior works. Particularly, comparing to other methods that do not use any type of box annotations, we obtain 84.07% on Flickr30k ( an absolute improvement of 4.69%), 67.40% on ReferIt (an absolute improvement of 7.68%), and 75.10%, 55.49% on RefCOCO+ test sets A and B respectively (an absolute improvement of 3.74% on average).

关键词： vision and language visual grounding

来源：评论

学校读者我要写书评

暂无评论

ZeroShape: Regression-based Zero-shot Shape Reconstruction

ZeroShape: Regression-based Zero-shot Shape Reconstruction

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Huang, Zixuan Stojanov, Stefan Thai, Anh Jampani, Varun Rehg, James M. Univ Illinois Champaign IL 61820 USA Georgia Inst Technol Atlanta GA 30332 USA Stabil AI London England

ISBN: (纸本)9798350353006

We study the problem of single-image zero-shot 3D shape reconstruction. Recent works learn zero-shot shape reconstruction through generative modeling of 3D assets, but these models are computationally expensive at train and inference time. In contrast, the traditional approach to this problem is regression-based, where deterministic models are trained to directly regress the object shape. Such regression methods possess much higher computational efficiency than generative methods. This raises a natural question: is generative modeling necessary for high performance, or conversely, are regression-based approaches still competitive? To answer this, we design a strong regression-based model, called ZeroShape, based on the converging findings in this field and a novel insight. We also curate a large real-world evaluation benchmark, with objects from three different real-world 3D datasets. This evaluation benchmark is more diverse and an order of magnitude larger than what prior works use to quantitatively evaluate their models, aiming at reducing the evaluation variance in our field. We show that ZeroShape not only achieves superior performance over state-of-the-art methods, but also demonstrates significantly higher computational and data efficiency.(1)

关键词： 3D generation 3D reconstruction 3D vision computer vision Deep Learning

来源：评论

学校读者我要写书评

暂无评论

Differentiable Display Photometric Stereo

Differentiable Display Photometric Stereo

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Choi, Seokjun Yoon, Seungwoo Nam, Giljoo Lee, Seungyong Baek, Seung-Hwan POSTECH Pohang South Korea Meta Real Labs Menlo Pk CA USA

ISBN: (纸本)9798350353006

Photometric stereo leverages variations in illumination conditions to reconstruct surface normals. Display photometric stereo, which employs a conventional monitor as an illumination source, has the potential to overcome limitations often encountered in bulky and difficult-to-use conventional setups. In this paper, we present differentiable display photometric stereo (DDPS), addressing an often over-looked challenge in display photometric stereo: the design of display patterns. Departing from using heuristic display patterns, DDPS learns the display patterns that yield accurate normal reconstruction for a target system in an end-to-end manner. To this end, we propose a differentiable framework that couples basis-illumination image formation with analytic photometric-stereo reconstruction. The differentiable framework facilitates the effective learning of display patterns via auto-differentiation. Also, for training supervision, we propose to use 3D printing for creating a real-world training dataset, enabling accurate reconstruction on the target real-world setup. Finally, we exploit that conventional LCD monitors emit polarized light, which allows for the optical separation of diffuse and specular reflections when combined with a polarization camera, leading to accurate normal reconstruction. Extensive evaluation of DDPS shows improved normal-reconstruction accuracy compared to heuristic patterns and demonstrates compelling properties such as robustness to pattern initialization, calibration errors, and simplifications in image formation and reconstruction.

关键词： Computational illumination computer graphics computer vision Photometric stereo

来源：评论

学校读者我要写书评

暂无评论

Fooling Polarization-based vision using Locally Controllable Polarizing Projection

Fooling Polarization-based Vision using Locally Controllable...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Li, Zhuoxiao Zhong, Zhihang Nobuhara, Shohei Nishino, Ko Zheng, Yinqiang Univ Tokyo Tokyo Japan Shanghai Artificial Intelligence Lab Shanghai Peoples R China Kyoto Univ Kyoto Japan

ISBN: (纸本)9798350353006

Polarization is a fundamental property of light that encodes abundant information regarding surface shape, material, illumination and viewing geometry. The computer vision community has witnessed a blossom of polarization-based vision applications, such as reflection removal, shape-from-polarization (SfP), transparent object segmentation and color constancy, partially due to the emergence of single-chip mono/color polarization sensors that make polarization data acquisition easier than ever. However, is polarization-based vision vulnerable to adversarial attacks? If so, is that possible to realize these adversarial attacks in the physical world, without being perceived by human eyes? In this paper, we warn the community of the vulnerability of polarization-based vision, which can be more serious than RGB-based vision. By adapting a commercial LCD projector, we achieve locally controllable polarizing projection, which is successfully utilized to fool state-of-the-art polarization-based vision algorithms for glass segmentation and SfP. Compared with existing physical attacks on RGB-based vision, which always suffer from the trade-off between attack efficacy and eye conceivability, the adversarial attackers based on polarizing projection are contact-free and visually imperceptible, since naked human eyes can rarely perceive the difference of viciously manipulated polarizing light and ordinary illumination. This poses unprecedented risks on polarization-based vision, for which due attentions should be paid and counter measures be considered.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Test-Time Adaptation for Depth Completion

Test-Time Adaptation for Depth Completion

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Park, Hyoungseob Gupta, Anjali Wong, Alex Yale Vision Lab New Haven CT 06501 USA

ISBN: (纸本)9798350353006

It is common to observe performance degradation when transferring models trained on some (source) datasets to target testing data due to a domain gap between them. Existing methods for bridging this gap, such as domain adaptation (DA), may require the source data on which the model was trained (often not available), while others, i.e., source-free DA, require many passes through the testing data. We propose an online test-time adaptation method for depth completion, the task of inferring a dense depth map from a single image and associated sparse depth map, that closes the performance gap in a single pass. We first present a study on how the domain shift in each data modality affects model performance. Based on our observations that the sparse depth modality exhibits a much smaller covariate shift than the image, we design an embedding module trained in the source domain that preserves a mapping from features encoding only sparse depth to those encoding image and sparse depth. During test time, sparse depth features are projected using this map as a proxy for source domain features and are used as guidance to train a set of auxiliary parameters (i.e., adaptation layer) to align image and sparse depth features from the target test domain to that of the source domain. We evaluate our method on indoor and outdoor scenarios and show that it improves over baselines by an average of 21.1%. Code available at ***/seobbro/TTA-depth-completion.

关键词： 3D computer vision Depth completion Depth estimation Multi-modal fusion Test time adaptation

来源：评论

学校读者我要写书评

暂无评论

Part-aware Unified Representation of Language and Skeleton for Zero-shot Action recognition

Part-aware Unified Representation of Language and Skeleton f...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Zhu, Anqi Ke, Qiuhong Gong, Mingming Bailey, James Univ Melbourne Melbourne Vic Australia Monash Univ Parkville Vic 3052 Australia Wellington Rd Clayton Vic 3800 Australia

ISBN: (纸本)9798350353006

While remarkable progress has been made on supervised skeleton-based action recognition, the challenge of zero-shot recognition remains relatively unexplored. In this paper, we argue that relying solely on aligning label-level semantics and global skeleton features is insufficient to effectively transfer locally consistent visual knowledge from seen to unseen classes. To address this limitation, we introduce Part-aware Unified Representation between Language and Skeleton (PURLS) to explore visual-semantic alignment at both local and global scales. PURLS introduces a new prompting module and a novel partitioning module to generate aligned textual and visual representations across different levels. The former leverages a pre-trained GPT-3 to infer refined descriptions of the global and local (body-part-based and temporal-interval-based) movements from the original action labels. The latter employs an adaptive sampling strategy to group visual features from all body joint movements that are semantically relevant to a given description. Our approach is evaluated on various skeleton/language backbones and three large-scale datasets, i.e., NTU-RGB+D 60, NTU-RGB+D 120, and a newly curated dataset Kinetics-skeleton 200. The results showcase the universality and superior performance of PURLS, surpassing prior skeleton-based solutions and standard baselines from other domains. The source codes can be accessed at https://***/azzh1/PURLS.

关键词： action recognition computer vision contrastive learning large language model representation learning skeleton action recognition zero-shot learning

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 22 23 24 25 26 27 28 29 30 31 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：