检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

20,860 篇 会议
105 篇 期刊文献
43 册 图书

馆藏范围

21,007 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

13,620 篇 工学
- 11,056 篇 计算机科学与技术...
- 2,652 篇 机械工程
- 2,252 篇 软件工程
- 914 篇 光学工程
- 885 篇 电气工程
- 529 篇 控制科学与工程
- 477 篇 信息与通信工程
- 216 篇 测绘科学与技术
- 135 篇 生物工程
- 127 篇 生物医学工程（可授...
- 98 篇 电子科学与技术（可...
- 92 篇 仪器科学与技术
- 46 篇 安全科学与工程
- 40 篇 建筑学
- 40 篇 化学工程与技术
- 39 篇 土木工程
- 37 篇 交通运输工程
- 35 篇 力学（可授工学、理...
- 33 篇 航空宇航科学与技...
3,494 篇 医学
- 3,489 篇 临床医学
- 32 篇 基础医学(可授医学...
2,247 篇 理学
- 1,145 篇 物理学
- 1,081 篇 数学
- 401 篇 生物学
- 384 篇 统计学（可授理学、...
- 245 篇 系统科学
- 46 篇 化学
343 篇 管理学
- 176 篇 管理科学与工程(可...
- 168 篇 图书情报与档案管...
- 34 篇 工商管理
31 篇 法学
19 篇 农学
15 篇 教育学
8 篇 经济学
5 篇 艺术学
2 篇 军事学
1 篇 文学

主题

8,141 篇 computer vision
2,886 篇 training
2,841 篇 pattern recognit...
1,809 篇 computational mo...
1,715 篇 visualization
1,493 篇 cameras
1,433 篇 three-dimensiona...
1,433 篇 feature extracti...
1,366 篇 shape
1,360 篇 face recognition
1,243 篇 image segmentati...
1,135 篇 robustness
1,124 篇 semantics
992 篇 computer archite...
985 篇 object detection
982 篇 layout
959 篇 benchmark testin...
935 篇 codes
900 篇 computer science
898 篇 object recogniti...

机构

174 篇 univ sci & techn...
158 篇 univ chinese aca...
153 篇 carnegie mellon ...
145 篇 chinese univ hon...
109 篇 microsoft resear...
103 篇 zhejiang univ pe...
99 篇 swiss fed inst t...
95 篇 tsinghua univers...
90 篇 microsoft res as...
90 篇 tsinghua univ pe...
88 篇 shanghai ai lab ...
81 篇 zhejiang univers...
77 篇 alibaba grp peop...
74 篇 hong kong univ s...
73 篇 university of sc...
72 篇 peking univ peop...
72 篇 university of ch...
68 篇 shanghai jiao to...
66 篇 univ oxford oxfo...
65 篇 google res mount...

作者

80 篇 van gool luc
70 篇 zhang lei
58 篇 timofte radu
48 篇 yang yi
47 篇 luc van gool
46 篇 xiaoou tang
44 篇 tian qi
43 篇 darrell trevor
42 篇 loy chen change
42 篇 sun jian
41 篇 qi tian
40 篇 li stan z.
38 篇 li fei-fei
37 篇 chen xilin
36 篇 shan shiguang
35 篇 zhou jie
35 篇 vasconcelos nuno
35 篇 liu yang
35 篇 torralba antonio
34 篇 liu xiaoming

语言

20,982 篇 英文
10 篇 中文
7 篇 其他
5 篇 土耳其文
2 篇 日文
2 篇 葡萄牙文

检索条件"任意字段=2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016"

共 21008 条记录，以下是331-340 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

SPIN: Simultaneous Perception, Interaction and Navigation

SPIN: Simultaneous Perception, Interaction and Navigation

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Uppal, Shagun Agarwal, Ananye Xiong, Haoyu Shaw, Kenneth Pathak, Deepak Carnegie Mellon Univ Pittsburgh PA 15213 USA

ISBN: (纸本)9798350353006

While there has been remarkable progress recently in the fields of manipulation and locomotion, mobile manipulation remains a long-standing challenge. Compared to locomotion or static manipulation, a mobile system must make a diverse range of long-horizon tasks feasible in unstructured and dynamic environments. While the applications are broad and interesting, there are a plethora of challenges in developing these systems such as coordination between the base and arm, reliance on onboard perception for perceiving and interacting with the environment, and most importantly, simultaneously integrating all these parts together. Prior works approach the problem using disentangled modular skills for mobility and manipulation that are trivially tied together. This causes several limitations such as compounding errors, delays in decision-making, and no whole-body coordination. In this work, we present a reactive mobile manipulation framework that uses an active visual system to consciously perceive and react to its environment. Similar to how humans leverage whole-body and hand-eye coordination, we develop a mobile manipulator that exploits its ability to move and see, more specifically - to move in order to see and to see in order to move. This allows it to not only move around and interact with its environment but also, choose "when" to perceive "what" using an active visual system. We observe that such an agent learns to navigate around complex cluttered scenarios while displaying agile whole-body coordination using only ego-vision without needing to create environment maps. Videos are available at https://***

关键词： Active vision End-to-end learning Mobile Manipulation Whole-body coordination

来源：评论

学校读者我要写书评

暂无评论

Volumetric Environment Representation for vision-Language Navigation

Volumetric Environment Representation for Vision-Language Na...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Liu, Rui Wang, Wenguan Yan, Yi Zhejiang Univ CCAI ReLER Hangzhou Zhejiang Peoples R China

ISBN: (纸本)9798350353006

vision-language navigation (VLN) requires an agent to navigate through an 3D environment based on visual observations and natural language instructions. It is clear that the pivotal factor for successful navigation lies in the comprehensive scene understanding. Previous VLN agents employ monocular frameworks to extract 2D features of perspective views directly. Though straightforward, they struggle for capturing 3D geometry and semantics, leading to a partial and incomplete environment representation. To achieve a comprehensive 3D representation with fine-grained details, we introduce a Volumetric Environment Representation (VER), which voxelizes the physical world into structured 3D cells. For each cell, VER aggregates multi-view 2D features into such a unified 3D space via 2D-3D sampling. Through coarse-to-fine feature extraction and multi-task learning for VER, our agent predicts 3D occupancy, 3D room layout, and 3D bounding boxes jointly. Based on online collected VERs, our agent performs volume state estimation and builds episodic memory for predicting the next step. Experimental results show our environment representations from multi-task learning lead to evident performance gains on VLN. Our model achieves state-of-the-art performance across VLN benchmarks (R2R, REVERIE, and R4R).

关键词： Semantics

来源：评论

学校读者我要写书评

暂无评论

PEM: Prototype-based Efficient MaskFormer for Image Segmentation

PEM: Prototype-based Efficient MaskFormer for Image Segmenta...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Cavagnero, Niccolo Rosi, Gabriele Cuttano, Claudia Pistilli, Francesca Ciccone, Marco Averta, Giuseppe Cermelli, Fabio Politecn Torino Turin Italy Focoos AI Rome Italy

ISBN: (纸本)9798350353006

Recent transformer-based architectures have shown impressive results in the field of image segmentation. Thanks to their flexibility, they obtain outstanding performance in multiple segmentation tasks, such as semantic and panoptic, under a single unified framework. To achieve such impressive performance, these architectures employ intensive operations and require substantial computational resources, which are often not available, especially on edge devices. To fill this gap, we propose Prototype-based Efficient MaskFormer (PEM), an efficient transformer-based architecture that can operate in multiple segmentation tasks. PEM proposes a novel prototype-based cross-attention which leverages the redundancy of visual features to restrict the computation and improve the efficiency without harming the performance. In addition, PEM introduces an efficient multi-scale feature pyramid network, capable of extracting features that have high semantic content in an efficient way, thanks to the combination of deformable convolutions and context-based self-modulation. We benchmark the proposed PEM architecture on two tasks, semantic and panoptic segmentation, evaluated on two different datasets, Cityscapes and ADE20K. PEM demonstrates outstanding performance on every task and dataset, outperforming task-specific architectures while being comparable and even better than computationally expensive baselines. Code is available at https://***/NiccoloCavagnero/PEM.

关键词： computer vision Efficient Neural Network Image Segmentation

来源：评论

学校读者我要写书评

暂无评论

UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image recognition

UniRepLKNet: A Universal Perception Large-Kernel ConvNet for...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Ding, Xiaohan Zhang, Yiyuan Ge, Yixiao Zhao, Sijie Song, Lin Yue, Xiangyu Shan, Ying Tencent AI Lab Shenzhen Peoples R China Chinese Univ Hong Kong Hong Kong Peoples R China

ISBN: (纸本)9798350353013;9798350353006

Large-kernel convolutional neural networks (ConvNets) have recently received extensive research attention, but two unresolved and critical issues demand further investigation. 1) The architectures of existing large-kernel ConvNets largely follow the design principles of conventional ConvNets or transformers, while the architectural design for large-kernel ConvNets remains under-addressed. 2) As transformers have dominated multiple modalities, it remains to be investigated whether ConvNets also have a strong universal perception ability in domains beyond vision. In this paper, we contribute from two aspects. 1) We propose four architectural guidelines for designing large-kernel ConvNets, the core of which is to exploit the essential characteristics of large kernels that distinguish them from small kernels - they can see wide without going deep. Following such guidelines, our proposed large-kernel ConvNet shows leading performance in image recognition (ImageNet accuracy of 88.0%, ADE20K mIoU of 55.6%, and COCO box AP of 56.4%), demonstrating better performance and higher speed than the recent powerful competitors. 2) We discover large kernels are the key to unlocking the exceptional performance of ConvNets in domains where they were originally not proficient. With certain modality-related pre-processing approaches, the proposed model achieves state-of-the-art performance on time-series forecasting and audio recognition tasks even without modality-specific customization to the architecture. All the code and models are publicly available on GitHub and Huggingface.

关键词： Large-Kernel ConvNets Multimodal Learning Network Architecture

来源：评论

学校读者我要写书评

暂无评论

Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs

Eyes Wide Shut? Exploring the Visual Shortcomings of Multimo...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Tong, Shengbang Liu, Zhuang Zhai, Yuexiang Ma, Yi Lecun, Yann Xie, Saining NYU New York NY 10003 USA Meta FAIR Menlo Pk CA 94025 USA Univ Calif Berkeley Berkeley CA USA

ISBN: (纸本)9798350353006

Is vision good enough for language? Recent advancements in multimodal models primarily stem from the powerful reasoning abilities of large language models (LLMs). However, the visual component typically depends only on the instance-level contrastive language-image pre-training ( CLIP). Our research reveals that the visual capabilities in recent MultiModal LLMs (MLLMs) still exhibit systematic shortcomings. To understand the roots of these errors, we explore the gap between the visual embedding space of CLIP and vision-only self-supervised learning. We identify "CLIP-blind pairs" - images that CLIP perceives as similar despite their clear visual differences. With these pairs, we construct the Multimodal Visual patterns (MMVP) benchmark. MMVP exposes areas where state-of-the-art systems, including GPT-4V, struggle with straightforward questions across nine basic visual patterns, often providing incorrect answers and hallucinated explanations. We further evaluate various CLIP-based vision-and-language models and found a notable correlation between visual patterns that challenge CLIP models and those problematic for multimodal LLMs. As an initial effort to address these issues, we propose a Mixture of Features (MoF) approach, demonstrating that integrating vision self-supervised learning features with MLLMs can significantly enhance their visual grounding capabilities. Together, our research suggests visual representation learning remains an open challenge, and accurate visual grounding is crucial for future successful multimodal systems.

关键词： Multimodal LLMs vision Language Model

来源：评论

学校读者我要写书评

暂无评论

Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine vision-Language Model

Jack of All Tasks, Master of Many: Designing General-purpose...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Pramanick, Shraman Han, Guangxing Hou, Rui Nag, Sayan Lim, Ser-Nam Ballas, Nicolas Wang, Qifan Chellappa, Rama Almahairi, Amjad Johns Hopkins Univ Baltimore MD 21218 USA Meta New York NY 10003 USA Univ Toronto Toronto ON Canada Univ Cent Florida Orlando FL 32816 USA

ISBN: (纸本)9798350353006

The ability of large language models (LLMs) to process visual inputs has given rise to general-purpose vision systems, unifying various vision-language (VL) tasks by instruction tuning. However, due to the enormous diversity in input-output formats in the vision domain, existing general-purpose models fail to successfully integrate segmentation and multi-image inputs with coarse-level tasks into a single framework. In this work, we introduce VistaLLM, a powerful visual system that addresses coarse- and fine-grained VL tasks over single and multiple input images using a unified framework. VistaLLM utilizes an instruction-guided image tokenizer that filters global embeddings using task descriptions to extract compressed and refined features from numerous images. Moreover, VistaLLM employs a gradient-aware adaptive sampling technique to represent binary segmentation masks as sequences, significantly improving over previously used uniform sampling. To bolster the desired capability of VistaLLM, we curate CoinIt, a comprehensive coarse-to-fine instruction tuning dataset with 6.8M samples. We also address the lack of multi-image grounding datasets by introducing a novel task, AttCoSeg (Attribute-level Co-Segmentation), which boosts the model's reasoning and grounding capability over multiple input images. Extensive experiments on a wide range of V- and VL tasks demonstrate the effectiveness of VistaLLM by achieving consistent state-of-the-art performance over strong base-lines across many downstream tasks. Our project page can be found at https://***/VistaLLM/.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf vision-Language Models

Emergent Open-Vocabulary Semantic Segmentation from Off-the-...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Luo, Jiayun Khandelwal, Siddhesh Sigal, Leonid Li, Boyang Nanyang Technol Univ Singapore Singapore Univ British Columbia Vector Inst AI Vancouver BC Canada

ISBN: (纸本)9798350353013;9798350353006

From image-text pairs, large-scale vision-language models (VLMs) learn to implicitly associate image regions with words, which prove effective for tasks like visual question answering. However, leveraging the learned association for open-vocabulary semantic segmentation remains a challenge. In this paper, we propose a simple, yet extremely effective, training-free technique, Plug-and-Play Open-Vocabulary Semantic Segmentation (PnP-OVSS) for this task. PnP-OVSS leverages a VLM with direct text-to-image cross-attention and an image-text matching loss. To balance between over-segmentation and under-segmentation, we introduce Salience Dropout;by iteratively dropping patches that the model is most attentive to, we are able to better resolve the entire extent of the segmentation mask. PnP-OVSS does not require any neural network training and performs hyperparameter tuning without the need for any segmentation annotations, even for a validation set. PnP-OVSS demonstrates substantial improvements over comparable baselines (+29.4% mIoU on Pascal VOC, +13.2% mIoU on Pascal Context, +14.0% mIoU on MS COCO, +2.4% mIoU on COCO Stuff) and even outperforms most baselines that conduct additional network training on top of pretrained VLMs. Our codebase is at https://***/letitiabanana/PnP-OVSS.

关键词： open-vocabulary semantic segmentation training-free

来源：评论

学校读者我要写书评

暂无评论

MULTIFLOW: Shifting Towards Task-Agnostic vision-Language Pruning

MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pr...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Farina, Matteo Mancini, Massimiliano Cunegatti, Elia Liu, Gaowen Iacca, Giovanni Ricci, Elisa Univ Trento Trento Italy Cisco Res Res Triangle Pk NC USA Fdn Bruno Kessler Povo Italy

ISBN: (纸本)9798350353006

While excellent in transfer learning, vision-Language models (VLMs) come with high computational costs due to their large number of parameters. To address this issue, removing parameters via model pruning is a viable solution. However, existing techniques for VLMs are task-specific, and thus require pruning the network from scratch for each new task of interest. In this work, we explore a new direction: Task-Agnostic vision-Language Pruning (TA-VLP). Given a pretrained VLM, the goal is to find a unique pruned counterpart transferable to multiple unknown downstream tasks. In this challenging setting, the transferable representations already encoded in the pretrained model are a key aspect to preserve. Thus, we propose Multimodal Flow Pruning (MULTIFLOW), a first, gradient-free, pruning framework for TA-VLP where: (i) the importance of a parameter is expressed in terms of its magnitude and its information flow, by incorporating the saliency of the neu-rons it connects;and (ii) pruning is driven by the emergent (multimodal) distribution of the VLM parameters after pretraining. We benchmark eight state-of-the-art pruning algorithms in the context of TA-VLP, experimenting with two VLMs, three vision-language tasks, and three pruning ratios. Our experimental results show that MULTIFLOW outperforms recent sophisticated, combinatorial competitors in the vast majority of the cases, paving the way towards addressing TA-VLP. The code is publicly available at https://***/FarinaMatteo/multiflow.

关键词： multimodal learning neural network pruning sparse neural networks transfer learning vision-language models

来源：评论

学校读者我要写书评

暂无评论

RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene Understanding

RegionPLC: Regional Point-Language Contrastive Learning for ...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Yang, Jihan Ding, Runyu Deng, Weipeng Wang, Zhe Qi, Xiaojuan Univ Hong Kong Hong Kong Peoples R China SenseTime Res Hong Kong Peoples R China

ISBN: (纸本)9798350353006

We propose a lightweight and scalable Regional Point-Language Contrastive learning framework, namely RegionPLC, for open-world 3D scene understanding, aiming to identify and recognize open-set objects and categories. Specifically, based on our empirical studies, we introduce a 3D-aware SFusion strategy that fuses 3D vision-language pairs derived from multiple 2D foundation models, yielding high-quality, dense region-level language descriptions without human 3D annotations. Subsequently, we devise a region-aware point-discriminative contrastive learning objective to enable robust and effective 3D learning from dense regional language supervision. We carry out extensive experiments on ScanNet, ScanNet200, and nuScenes datasets, and our model outperforms prior 3D open-world scene understanding approaches by an average of 17.2% and 9.1% for semantic and instance segmentation, respectively, while maintaining greater scalability and lower resource demands. Furthermore, our method has the flexibility to be effortlessly integrated with language models to enable open-ended grounded 3D reasoning without extra task-specific training. Code will be released at github.

关键词： 3D vision embodied AI Open-world

来源：评论

学校读者我要写书评

暂无评论

Unravelling Robustness of Deep Face recognition Networks Against Illicit Drug Abuse Images

Unravelling Robustness of Deep Face Recognition Networks Aga...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Dhake, Hruturaj Agarwal, Akshay IISER Bhopal Data Sci & Engn Bhopal India

ISBN: (纸本)9798350365474

Alteration in facial features can lead to a significant drop in recognition performance. These alterations can be due to several factors: one such prominent and less explored factor is illicit drug abuse. To advance the understanding of how drug abuse faces affect the performance of state-of-the-art deep face recognition (DFR) networks, in this study, we have utilized clean and illicit drug abuse faces. Extensive studies are performed on deep face recognition and soft biometric identification, such as gender, ethnicity, and expression recognition. It is observed that illicit drug abuse not only impacts the identity recognition performance but also degrades the soft biometrics identification accuracy. Therefore, to advance the integrity of DFR, we have performed the detection of illicit drug abuse as a potential solution to its mitigation. In the end, the robustness of the drug abuse face detector is evaluated under the prominent use of social-media filters on face images.

关键词：

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 30 31 32 33 34 35 36 37 38 39 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：