检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

22,783 篇 会议
112 篇 期刊文献
23 册 图书

馆藏范围

22,917 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

13,409 篇 工学
- 10,890 篇 计算机科学与技术...
- 3,457 篇 软件工程
- 2,431 篇 机械工程
- 1,722 篇 光学工程
- 1,020 篇 控制科学与工程
- 1,000 篇 电气工程
- 767 篇 信息与通信工程
- 394 篇 仪器科学与技术
- 337 篇 生物工程
- 257 篇 生物医学工程（可授...
- 217 篇 电子科学与技术（可...
- 116 篇 安全科学与工程
- 113 篇 化学工程与技术
- 98 篇 测绘科学与技术
- 97 篇 交通运输工程
- 86 篇 建筑学
- 82 篇 土木工程
3,364 篇 医学
- 3,349 篇 临床医学
- 79 篇 基础医学(可授医学...
3,256 篇 理学
- 1,954 篇 物理学
- 1,669 篇 数学
- 567 篇 统计学（可授理学、...
- 484 篇 生物学
- 245 篇 系统科学
- 111 篇 化学
511 篇 管理学
- 302 篇 图书情报与档案管...
- 224 篇 管理科学与工程(可...
- 75 篇 工商管理
252 篇 艺术学
- 252 篇 设计学（可授艺术学...
63 篇 法学
- 60 篇 社会学
40 篇 农学
25 篇 教育学
19 篇 经济学
11 篇 军事学
4 篇 文学

主题

10,131 篇 computer vision
4,030 篇 pattern recognit...
2,900 篇 training
1,958 篇 computational mo...
1,793 篇 cameras
1,759 篇 visualization
1,485 篇 shape
1,466 篇 image segmentati...
1,448 篇 feature extracti...
1,412 篇 three-dimensiona...
1,288 篇 robustness
1,170 篇 computer archite...
1,144 篇 layout
1,142 篇 computer science
1,134 篇 semantics
1,071 篇 object detection
1,043 篇 conferences
1,009 篇 benchmark testin...
967 篇 codes
810 篇 face recognition

机构

135 篇 univ sci & techn...
118 篇 univ chinese aca...
118 篇 chinese univ hon...
110 篇 carnegie mellon ...
99 篇 tsinghua univers...
99 篇 microsoft resear...
94 篇 swiss fed inst t...
92 篇 zhejiang univ pe...
82 篇 university of sc...
81 篇 zhejiang univers...
77 篇 shanghai ai lab ...
77 篇 university of ch...
72 篇 shanghai jiao to...
68 篇 microsoft res as...
65 篇 national laborat...
65 篇 alibaba grp peop...
63 篇 adobe research
63 篇 tsinghua univ pe...
60 篇 peking univ peop...
59 篇 peng cheng labor...

作者

78 篇 van gool luc
72 篇 timofte radu
63 篇 zhang lei
45 篇 luc van gool
40 篇 yang yi
37 篇 loy chen change
33 篇 xiaoou tang
33 篇 li stan z.
33 篇 qi tian
32 篇 sun jian
31 篇 liu yang
31 篇 li fei-fei
30 篇 chen chen
30 篇 tian qi
30 篇 pascal fua
29 篇 darrell trevor
28 篇 ying shan
27 篇 li xin
27 篇 vasconcelos nuno
27 篇 hanqing lu

语言

22,678 篇 英文
213 篇 其他
20 篇 中文
5 篇 土耳其文
2 篇 日文

检索条件"任意字段=1994 IEEE Computer-Society Conference on Computer Vision and Pattern Recognition"

共 22918 条记录，以下是401-410 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

On Scaling up a Multilingual vision and Language Model

On Scaling up a Multilingual Vision and Language Model

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Chen, Xi Djolonga, Josip Padlewski, Piotr Mustafa, Basil Changpinyo, Soravit Wu, Jialin Ruiz, Carlos Riquelme Goodman, Sebastian Wang, Xiao Tay, Yi Shakeri, Siamak Dehghani, Mostafa Salz, Daniel Lucic, Mario Tschannen, Michael Nagrani, Arsha Hu, Hexiang Joshi, Mandar Pang, Bo Montgomery, Ceslee Pietrzyk, Paulina Ritter, Marvin Piergiovanni, A. J. Minderer, Matthias Pavetic, Filip Waters, Austin Li, Gang Alabdulmohsin, Ibrahim Beyer, Lucas Amelot, Julien Lee, Kenton Steiner, Andreas Peter Li, Yang Keysers, Daniel Arnab, Anurag Xu, Yuanzhong Rong, Keran Kolesnikov, Alexander Seyedhosseini, Mojtaba Angelova, Anelia Zhai, Xiaohua Houlsby, Neil Soricut, Radu Google Mountain View CA 94043 USA

ISBN: (纸本)9798350353006

We explore the boundaries of scaling up a multilingual vision and language model, both in terms of size of the components and the breadth of its training task mixture. Our model achieves new levels of performance on a wide-range of varied and complex tasks, including multiple image-based captioning and question-answering tasks, image-based document understanding and few-shot (in-context) learning, as well as object detection, video question answering, and video captioning. Our model advances the state-of-the-art on most vision-and-language benchmarks considered (20+ of them). Finally, we observe emerging capabilities, such as complex counting and multilingual object detection, tasks that are not explicitly in the training mix.

关键词： language multimodal pretraining vision

来源：评论

学校读者我要写书评

暂无评论

SHViT: Single-Head vision Transformer with Memory Efficient Macro Design

SHViT: Single-Head Vision Transformer with Memory Efficient ...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Yun, Seokju Ro, Youngmin Univ Seoul Machine Intelligence Lab Seoul South Korea

ISBN: (纸本)9798350353013;9798350353006

Recently, efficient vision Transformers have shown great performance with low latency on resource-constrained devices. Conventionally, they use 4x4 patch embeddings and a 4-stage structure at the macro level, while utilizing sophisticated attention with multi-head configuration at the micro level. This paper aims to address computational redundancy at all design levels in a memory-efficient manner. We discover that using larger-stride patchify stem not only reduces memory access costs but also achieves competitive performance by leveraging token representations with reduced spatial redundancy from the early stages. Furthermore, our preliminary analyses suggest that attention layers in the early stages can be substituted with convolutions, and several attention heads in the latter stages are computationally redundant. To handle this, we introduce a single-head attention module that inherently prevents head redundancy and simultaneously boosts accuracy by parallelly combining global and local information. Building upon our solutions, we introduce SHViT, a SingleHead vision Transformer that obtains the state-of-the-art speed-accuracy tradeoff. For example, on ImageNet-1k, our SHViT-S4 is 3.3x, 8.1x, and 2.4x faster than MobileViTv2 x1.0 on GPU, CPU, and iPhone12 mobile device, respectively, while being 1.3% more accurate. For object detection and instance segmentation on MS COCO using MaskRCNN head, our model achieves performance comparable to FastViT-SA12 while exhibiting 3.8x and 2.0x lower backbone latency on GPU and mobile device, respectively.

关键词： CNNs efficient visual backbone vision Transformer

来源：评论

学校读者我要写书评

暂无评论

RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene Understanding

RegionPLC: Regional Point-Language Contrastive Learning for ...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Yang, Jihan Ding, Runyu Deng, Weipeng Wang, Zhe Qi, Xiaojuan Univ Hong Kong Hong Kong Peoples R China SenseTime Res Hong Kong Peoples R China

ISBN: (纸本)9798350353006

We propose a lightweight and scalable Regional Point-Language Contrastive learning framework, namely RegionPLC, for open-world 3D scene understanding, aiming to identify and recognize open-set objects and categories. Specifically, based on our empirical studies, we introduce a 3D-aware SFusion strategy that fuses 3D vision-language pairs derived from multiple 2D foundation models, yielding high-quality, dense region-level language descriptions without human 3D annotations. Subsequently, we devise a region-aware point-discriminative contrastive learning objective to enable robust and effective 3D learning from dense regional language supervision. We carry out extensive experiments on ScanNet, ScanNet200, and nuScenes datasets, and our model outperforms prior 3D open-world scene understanding approaches by an average of 17.2% and 9.1% for semantic and instance segmentation, respectively, while maintaining greater scalability and lower resource demands. Furthermore, our method has the flexibility to be effortlessly integrated with language models to enable open-ended grounded 3D reasoning without extra task-specific training. Code will be released at github.

关键词： 3D vision embodied AI Open-world

来源：评论

学校读者我要写书评

暂无评论

Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf vision-Language Models

Emergent Open-Vocabulary Semantic Segmentation from Off-the-...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Luo, Jiayun Khandelwal, Siddhesh Sigal, Leonid Li, Boyang Nanyang Technol Univ Singapore Singapore Univ British Columbia Vector Inst AI Vancouver BC Canada

ISBN: (纸本)9798350353013;9798350353006

From image-text pairs, large-scale vision-language models (VLMs) learn to implicitly associate image regions with words, which prove effective for tasks like visual question answering. However, leveraging the learned association for open-vocabulary semantic segmentation remains a challenge. In this paper, we propose a simple, yet extremely effective, training-free technique, Plug-and-Play Open-Vocabulary Semantic Segmentation (PnP-OVSS) for this task. PnP-OVSS leverages a VLM with direct text-to-image cross-attention and an image-text matching loss. To balance between over-segmentation and under-segmentation, we introduce Salience Dropout;by iteratively dropping patches that the model is most attentive to, we are able to better resolve the entire extent of the segmentation mask. PnP-OVSS does not require any neural network training and performs hyperparameter tuning without the need for any segmentation annotations, even for a validation set. PnP-OVSS demonstrates substantial improvements over comparable baselines (+29.4% mIoU on Pascal VOC, +13.2% mIoU on Pascal Context, +14.0% mIoU on MS COCO, +2.4% mIoU on COCO Stuff) and even outperforms most baselines that conduct additional network training on top of pretrained VLMs. Our codebase is at https://***/letitiabanana/PnP-OVSS.

关键词： open-vocabulary semantic segmentation training-free

来源：评论

学校读者我要写书评

暂无评论

PracticalDG: Perturbation Distillation on vision-Language Models for Hybrid Domain Generalization

PracticalDG: Perturbation Distillation on Vision-Language Mo...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Chen, Zining Wang, Weiqiu Zhao, Zhicheng Su, Fei Men, Aidong Meng, Hongying Beijing Univ Posts & Telecommun Sch Artificial Intelligence Beijing Peoples R China Beijing Key Lab Network Syst & Network Culture Beijing Peoples R China Minist Culture & Tourism Key Lab Interact Technol & Experience Syst Beijing Peoples R China Brunel Univ Uxbridge Uxbridge Middx England

ISBN: (纸本)9798350353006

Domain Generalization (DG) aims to resolve distribution shifts between source and target domains, and current DG methods are default to the setting that data from source and target domains share identical categories. Nevertheless, there exists unseen classes from target domains in practical scenarios. To address this issue, Open Set Domain Generalization (OSDG) has emerged and several methods have been exclusively proposed. However, most ex-isting methods adopt complex architectures with slight improvement compared with DG methods. Recently, vision-language models (VLMs) have been introduced in DG following the fine-tuning paradigm, but consume huge training overhead with large vision models. Therefore, in this paper, we innovate to transfer knowledge from VLMs to lightweight vision models and improve the robustness by introducing Perturbation Distillation (PD) from three perspectives, including Score, Class and Instance (SCI), named SCI- PD. Moreover, previous methods are oriented by the benchmarks with identical and fixed splits, ignoring the divergence between source domains. These methods are revealed to suffer from sharp performance decay with our proposed new benchmark Hybrid Domain Generalization (HDG) and a novel metric H-2-CV, which construct various splits to comprehensively assess the robustness of algorithms. Extensive experiments demonstrate that our method outperforms state-of-the-art algorithms on multiple datasets, especially improving the robustness when con-fronting data scarcity.

关键词： Visual languages

来源：评论

学校读者我要写书评

暂无评论

Instance-aware Exploration-Verification-Exploitation for Instance ImageGoal Navigation

Instance-aware Exploration-Verification-Exploitation for Ins...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Lei, Xiaohan Wang, Min Zhou, Wengang Li, Li Li, Houqiang Univ Sci & Technol China MoE Key Lab Brain Inspired Intelligent Percept & Hefei Anhui Peoples R China Hefei Comprehens Natl Sci Ctr Inst Artificial Intelligence Hefei Anhui Peoples R China

ISBN: (纸本)9798350353006

As a new embodied vision task, Instance ImageGoal Navigation (IIN) aims to navigate to a specified object depicted by a goal image in an unexplored environment. The main challenge of this task lies in identifying the target object from different viewpoints while rejecting similar distractors. Existing ImageGoal Navigation methods usually adopt the simple Exploration-Exploitation framework and ignore the identification of specific instance during navigation. In this work, we propose to imitate the human behaviour of "getting closer to confirm" when distinguishing objects from a distance. Specifically, we design a new modular navigation framework named Instance-aware Exploration-Verification- Exploitation (IEVE) for instance-level image goal navigation. Our method allows for active switching among the exploration, verification, and exploitation actions, thereby facilitating the agent in making reasonable decisions under different situations. On the challenging HabitatMatterport 3D semantic (HM3D-SEM) dataset, our method surpasses previous state-of-the-art work, with a classical segmentation model (0.684 vs. 0.561 success) or a robust model (0.702 vs. 0.561 success). Our code will be made publicly available at https://***/XiaohanLei/IEVE.

关键词： Embodied vision Verification Visual Navigation

来源：评论

学校读者我要写书评

暂无评论

Differentiable Shadow Mapping for Efficient Inverse Graphics

Differentiable Shadow Mapping for Efficient Inverse Graphics

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Worchel, Markus Alexa, Marc TU Berlin Berlin Germany

ISBN: (纸本)9798350301298

We show how shadows can be efficiently generated in differentiable rendering of triangle meshes. Our central observation is that pre-filtered shadow mapping, a technique for approximating shadows based on rendering from the perspective of a light, can be combined with existing differentiable rasterizers to yield differentiable visibility information. We demonstrate at several inverse graphics problems that differentiable shadow maps are orders of magnitude faster than differentiable light transport simulation with similar accuracy - while differentiable rasterization without shadows often fails to converge.

关键词： vision + graphics

来源：评论

学校读者我要写书评

暂无评论

Bootstrapping SparseFormers from vision Foundation Models

Bootstrapping SparseFormers from Vision Foundation Models

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Gao, Ziteng Tong, Zhan Lin, Kevin Qinghong Chen, Joya Shou, Mike Zheng Natl Univ Singapore Show Lab Singapore Singapore

ISBN: (纸本)9798350353006

The recently proposed SparseFormer architecture provides an alternative approach to visual understanding by utilizing a significantly lower number of visual tokens via adjusting RoIs, greatly reducing computational costs while still achieving promising performance. However, training SparseFormers from scratch is still expensive, and scal-ing up the number of parameters can be challenging. In this paper, we propose to bootstrap SparseFormers from ViT-based vision foundation models in a simple and efficient way. Since the majority of SparseFormer blocks are the standard transformer ones, we can inherit weights from large-scale pre-trained vision transformers and freeze them as much as possible. Therefore, we only need to train the SparseFormer-specific lightweight focusing transformer to adjust token RoIs and fine-tune a few early pre-trained blocks to align the final token representation. In such a way, we can bootstrap SparseFormer architectures from various large-scale pre-trained models (e.g., IN-21K pre-trained AugRegs or CLIPs) using a rather smaller amount of training samples (e.g., IN-1K) and without labels or captions within just a few hours. As a result, the bootstrapped unimodal SparseFormer (from AugReg-ViT-L/16-384) can reach 84.9% accuracy on IN-1K with only 49 tokens, and the multimodal SparseFormer from CLIPs also demonstrates notable zero-shot performance with highly reduced computational cost without seeing any caption during the bootstrapping procedure. In addition, CLIP-bootstrapped SparseFormers, which align the output space with language without seeing a word, can serve as efficient vision encoders in multimodal large language models. Code and models are available at https://***/showlab/sparseformer

关键词： Cost reduction

来源：评论

学校读者我要写书评

暂无评论

Adapting Short-Term Transformers for Action Detection in Untrimmed Videos

Adapting Short-Term Transformers for Action Detection in Unt...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Yang, Min Gao, Huan Guo, Ping Wang, Limin Nanjing Univ State Key Lab Novel Software Technol Nanjing Peoples R China Inchitech Beijing Peoples R China Intel Labs China Hillsboro OR USA Shanghai AI Lab Shanghai Peoples R China

ISBN: (纸本)9798350353006

vision Transformer (ViT) has shown high potential in video recognition, owing to its flexible design, adaptable self-attention mechanisms, and the efficacy of masked pretraining. Yet, it remains unclear how to adapt these pretrained short-term ViTs for temporal action detection (TAD) in untrimmed videos. The existing works treat them as off-the-shelf feature extractors for each short-trimmed snippet without capturing the fine-grained relation among different snippets in a broader temporal context. To mitigate this issue, this paper focuses on designing a new mechanism for adapting these pre-trained ViT models as a unified long-form video transformer to fully unleash its modeling power in capturing inter-snippet relation, while still keeping low computation overhead and memory consumption for efficient TAD. To this end, we design effective crosssnippet propagation modules to gradually exchange short-term video information among different snippets from two levels. For inner-backbone information propagation, we introduce a cross-snippet propagation strategy to enable multi-snippet temporal feature interaction inside the backbone. For post-backbone information propagation, we propose temporal transformer layers for further clip-level modeling. With the plain ViT-B pre-trained with VideoMAE, our end-to-end temporal action detector (ViT-TAD) yields a very competitive performance to previous temporal action detectors, riching up to 69.5 average mAP on THUMOS14, 37.40 average mAP on ActivityNet-1.3 and 17.20 average mAP on FineAction.

关键词： temporal action detection vision Transformer

来源：评论

学校读者我要写书评

暂无评论

Continual Forgetting for Pre-trained vision Models

Continual Forgetting for Pre-trained Vision Models

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Zhao, Hongbo Ni, Bolin Fang, Junsong Wang, Yuxi Chen, Yuntao Meng, Gaofeng Zhang, Zhaoxiang Chinese Acad Sci Inst Automat State Key Lab Multimodal Artificial Intelligence Beijing Peoples R China Chinese Acad Sci Ctr Artificial Intelligence & Robot Hong Kong Inst Sci & Innovat Beijing Peoples R China Univ Chinese Acad Sci Beijing Peoples R China Shanghai Artificial Intelligence Lab Shanghai Peoples R China

ISBN: (纸本)9798350353006

For privacy and security concerns, the need to erase unwanted information from pre-trained vision models is becoming evident nowadays. In real-world scenarios, erasure requests originate at any time from both users and model owners. These requests usually form a sequence. Therefore, under such a setting, selective information is expected to be continuously removed from a pre-trained model while maintaining the rest. We define this problem as continual forgetting and identify two key challenges. (i) For unwanted knowledge, efficient and effective deleting is crucial. (ii) For remaining knowledge, the impact brought by the forgetting procedure should be minimal. To address them, we propose Group Sparse LoRA (GS-LoRA). Specifically, towards (i), we use LoRA modules to fine-tune the FFN layers in Transformer blocks for each forgetting task independently, and towards (ii), a simple group sparse regularization is adopted, enabling automatic selection of specific LoRA groups and zeroing out the others. GS-LoRA is effective, parameter-efficient, data-efficient, and easy to implement. We conduct extensive experiments on face recognition, object detection and image classification and demonstrate that GS-LoRA manages to forget specific classes with minimal impact on other classes. Codes will be released on https://***/bjzhb666/GS-LoRA.

关键词： Continual Forgetting Machine Unlearning

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 37 38 39 40 41 42 43 44 45 46 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：