检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

20,798 篇 会议
88 篇 期刊文献
65 册 图书

馆藏范围

20,950 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

13,275 篇 工学
- 10,923 篇 计算机科学与技术...
- 2,484 篇 机械工程
- 2,307 篇 软件工程
- 913 篇 光学工程
- 771 篇 电气工程
- 556 篇 控制科学与工程
- 405 篇 信息与通信工程
- 210 篇 测绘科学与技术
- 131 篇 生物医学工程（可授...
- 104 篇 电子科学与技术（可...
- 100 篇 生物工程
- 92 篇 仪器科学与技术
- 56 篇 化学工程与技术
- 52 篇 建筑学
- 48 篇 土木工程
- 44 篇 安全科学与工程
- 38 篇 力学（可授工学、理...
- 38 篇 航空宇航科学与技...
- 35 篇 交通运输工程
3,457 篇 医学
- 3,449 篇 临床医学
- 34 篇 基础医学(可授医学...
2,315 篇 理学
- 1,154 篇 数学
- 1,132 篇 物理学
- 417 篇 统计学（可授理学、...
- 386 篇 生物学
- 252 篇 系统科学
- 57 篇 化学
353 篇 管理学
- 184 篇 图书情报与档案管...
- 176 篇 管理科学与工程(可...
- 32 篇 工商管理
28 篇 法学
20 篇 农学
15 篇 教育学
9 篇 经济学
8 篇 艺术学
5 篇 文学
5 篇 军事学

主题

8,203 篇 computer vision
3,010 篇 pattern recognit...
2,732 篇 training
1,769 篇 computational mo...
1,657 篇 visualization
1,483 篇 cameras
1,415 篇 shape
1,369 篇 three-dimensiona...
1,369 篇 face recognition
1,285 篇 image segmentati...
1,272 篇 feature extracti...
1,178 篇 robustness
1,090 篇 semantics
1,040 篇 layout
1,007 篇 object detection
975 篇 object recogniti...
969 篇 computer science
946 篇 computer archite...
946 篇 benchmark testin...
931 篇 codes

机构

174 篇 univ sci & techn...
154 篇 carnegie mellon ...
148 篇 univ chinese aca...
144 篇 chinese univ hon...
113 篇 microsoft resear...
103 篇 zhejiang univ pe...
99 篇 swiss fed inst t...
97 篇 tsinghua univ pe...
93 篇 tsinghua univers...
91 篇 microsoft res as...
88 篇 shanghai ai lab ...
81 篇 zhejiang univers...
76 篇 alibaba grp peop...
74 篇 hong kong univ s...
73 篇 university of sc...
72 篇 peking univ peop...
69 篇 university of ch...
68 篇 shanghai jiao to...
66 篇 google res mount...
66 篇 univ oxford oxfo...

作者

80 篇 van gool luc
71 篇 zhang lei
59 篇 timofte radu
48 篇 yang yi
47 篇 xiaoou tang
44 篇 darrell trevor
43 篇 tian qi
43 篇 luc van gool
42 篇 loy chen change
42 篇 sun jian
42 篇 li fei-fei
40 篇 qi tian
39 篇 li stan z.
37 篇 liu yang
37 篇 chen xilin
36 篇 shan shiguang
35 篇 liu xiaoming
35 篇 vasconcelos nuno
35 篇 torralba antonio
32 篇 zhou jie

语言

20,928 篇 英文
14 篇 中文
6 篇 其他
2 篇 日文
2 篇 土耳其文

检索条件"任意字段=2009 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009"

共 20951 条记录，以下是331-340 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image recognition

UniRepLKNet: A Universal Perception Large-Kernel ConvNet for...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Ding, Xiaohan Zhang, Yiyuan Ge, Yixiao Zhao, Sijie Song, Lin Yue, Xiangyu Shan, Ying Tencent AI Lab Shenzhen Peoples R China Chinese Univ Hong Kong Hong Kong Peoples R China

ISBN: (纸本)9798350353013;9798350353006

Large-kernel convolutional neural networks (ConvNets) have recently received extensive research attention, but two unresolved and critical issues demand further investigation. 1) The architectures of existing large-kernel ConvNets largely follow the design principles of conventional ConvNets or transformers, while the architectural design for large-kernel ConvNets remains under-addressed. 2) As transformers have dominated multiple modalities, it remains to be investigated whether ConvNets also have a strong universal perception ability in domains beyond vision. In this paper, we contribute from two aspects. 1) We propose four architectural guidelines for designing large-kernel ConvNets, the core of which is to exploit the essential characteristics of large kernels that distinguish them from small kernels - they can see wide without going deep. Following such guidelines, our proposed large-kernel ConvNet shows leading performance in image recognition (ImageNet accuracy of 88.0%, ADE20K mIoU of 55.6%, and COCO box AP of 56.4%), demonstrating better performance and higher speed than the recent powerful competitors. 2) We discover large kernels are the key to unlocking the exceptional performance of ConvNets in domains where they were originally not proficient. With certain modality-related pre-processing approaches, the proposed model achieves state-of-the-art performance on time-series forecasting and audio recognition tasks even without modality-specific customization to the architecture. All the code and models are publicly available on GitHub and Huggingface.

关键词： Large-Kernel ConvNets Multimodal Learning Network Architecture

来源：评论

学校读者我要写书评

暂无评论

Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs

Eyes Wide Shut? Exploring the Visual Shortcomings of Multimo...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Tong, Shengbang Liu, Zhuang Zhai, Yuexiang Ma, Yi Lecun, Yann Xie, Saining NYU New York NY 10003 USA Meta FAIR Menlo Pk CA 94025 USA Univ Calif Berkeley Berkeley CA USA

ISBN: (纸本)9798350353006

Is vision good enough for language? Recent advancements in multimodal models primarily stem from the powerful reasoning abilities of large language models (LLMs). However, the visual component typically depends only on the instance-level contrastive language-image pre-training ( CLIP). Our research reveals that the visual capabilities in recent MultiModal LLMs (MLLMs) still exhibit systematic shortcomings. To understand the roots of these errors, we explore the gap between the visual embedding space of CLIP and vision-only self-supervised learning. We identify "CLIP-blind pairs" - images that CLIP perceives as similar despite their clear visual differences. With these pairs, we construct the Multimodal Visual patterns (MMVP) benchmark. MMVP exposes areas where state-of-the-art systems, including GPT-4V, struggle with straightforward questions across nine basic visual patterns, often providing incorrect answers and hallucinated explanations. We further evaluate various CLIP-based vision-and-language models and found a notable correlation between visual patterns that challenge CLIP models and those problematic for multimodal LLMs. As an initial effort to address these issues, we propose a Mixture of Features (MoF) approach, demonstrating that integrating vision self-supervised learning features with MLLMs can significantly enhance their visual grounding capabilities. Together, our research suggests visual representation learning remains an open challenge, and accurate visual grounding is crucial for future successful multimodal systems.

关键词： Multimodal LLMs vision Language Model

来源：评论

学校读者我要写书评

暂无评论

Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine vision-Language Model

Jack of All Tasks, Master of Many: Designing General-purpose...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Pramanick, Shraman Han, Guangxing Hou, Rui Nag, Sayan Lim, Ser-Nam Ballas, Nicolas Wang, Qifan Chellappa, Rama Almahairi, Amjad Johns Hopkins Univ Baltimore MD 21218 USA Meta New York NY 10003 USA Univ Toronto Toronto ON Canada Univ Cent Florida Orlando FL 32816 USA

ISBN: (纸本)9798350353006

The ability of large language models (LLMs) to process visual inputs has given rise to general-purpose vision systems, unifying various vision-language (VL) tasks by instruction tuning. However, due to the enormous diversity in input-output formats in the vision domain, existing general-purpose models fail to successfully integrate segmentation and multi-image inputs with coarse-level tasks into a single framework. In this work, we introduce VistaLLM, a powerful visual system that addresses coarse- and fine-grained VL tasks over single and multiple input images using a unified framework. VistaLLM utilizes an instruction-guided image tokenizer that filters global embeddings using task descriptions to extract compressed and refined features from numerous images. Moreover, VistaLLM employs a gradient-aware adaptive sampling technique to represent binary segmentation masks as sequences, significantly improving over previously used uniform sampling. To bolster the desired capability of VistaLLM, we curate CoinIt, a comprehensive coarse-to-fine instruction tuning dataset with 6.8M samples. We also address the lack of multi-image grounding datasets by introducing a novel task, AttCoSeg (Attribute-level Co-Segmentation), which boosts the model's reasoning and grounding capability over multiple input images. Extensive experiments on a wide range of V- and VL tasks demonstrate the effectiveness of VistaLLM by achieving consistent state-of-the-art performance over strong base-lines across many downstream tasks. Our project page can be found at https://***/VistaLLM/.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf vision-Language Models

Emergent Open-Vocabulary Semantic Segmentation from Off-the-...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Luo, Jiayun Khandelwal, Siddhesh Sigal, Leonid Li, Boyang Nanyang Technol Univ Singapore Singapore Univ British Columbia Vector Inst AI Vancouver BC Canada

ISBN: (纸本)9798350353013;9798350353006

From image-text pairs, large-scale vision-language models (VLMs) learn to implicitly associate image regions with words, which prove effective for tasks like visual question answering. However, leveraging the learned association for open-vocabulary semantic segmentation remains a challenge. In this paper, we propose a simple, yet extremely effective, training-free technique, Plug-and-Play Open-Vocabulary Semantic Segmentation (PnP-OVSS) for this task. PnP-OVSS leverages a VLM with direct text-to-image cross-attention and an image-text matching loss. To balance between over-segmentation and under-segmentation, we introduce Salience Dropout;by iteratively dropping patches that the model is most attentive to, we are able to better resolve the entire extent of the segmentation mask. PnP-OVSS does not require any neural network training and performs hyperparameter tuning without the need for any segmentation annotations, even for a validation set. PnP-OVSS demonstrates substantial improvements over comparable baselines (+29.4% mIoU on Pascal VOC, +13.2% mIoU on Pascal Context, +14.0% mIoU on MS COCO, +2.4% mIoU on COCO Stuff) and even outperforms most baselines that conduct additional network training on top of pretrained VLMs. Our codebase is at https://***/letitiabanana/PnP-OVSS.

关键词： open-vocabulary semantic segmentation training-free

来源：评论

学校读者我要写书评

暂无评论

MULTIFLOW: Shifting Towards Task-Agnostic vision-Language Pruning

MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pr...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Farina, Matteo Mancini, Massimiliano Cunegatti, Elia Liu, Gaowen Iacca, Giovanni Ricci, Elisa Univ Trento Trento Italy Cisco Res Res Triangle Pk NC USA Fdn Bruno Kessler Povo Italy

ISBN: (纸本)9798350353006

While excellent in transfer learning, vision-Language models (VLMs) come with high computational costs due to their large number of parameters. To address this issue, removing parameters via model pruning is a viable solution. However, existing techniques for VLMs are task-specific, and thus require pruning the network from scratch for each new task of interest. In this work, we explore a new direction: Task-Agnostic vision-Language Pruning (TA-VLP). Given a pretrained VLM, the goal is to find a unique pruned counterpart transferable to multiple unknown downstream tasks. In this challenging setting, the transferable representations already encoded in the pretrained model are a key aspect to preserve. Thus, we propose Multimodal Flow Pruning (MULTIFLOW), a first, gradient-free, pruning framework for TA-VLP where: (i) the importance of a parameter is expressed in terms of its magnitude and its information flow, by incorporating the saliency of the neu-rons it connects;and (ii) pruning is driven by the emergent (multimodal) distribution of the VLM parameters after pretraining. We benchmark eight state-of-the-art pruning algorithms in the context of TA-VLP, experimenting with two VLMs, three vision-language tasks, and three pruning ratios. Our experimental results show that MULTIFLOW outperforms recent sophisticated, combinatorial competitors in the vast majority of the cases, paving the way towards addressing TA-VLP. The code is publicly available at https://***/FarinaMatteo/multiflow.

关键词： multimodal learning neural network pruning sparse neural networks transfer learning vision-language models

来源：评论

学校读者我要写书评

暂无评论

RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene Understanding

RegionPLC: Regional Point-Language Contrastive Learning for ...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Yang, Jihan Ding, Runyu Deng, Weipeng Wang, Zhe Qi, Xiaojuan Univ Hong Kong Hong Kong Peoples R China SenseTime Res Hong Kong Peoples R China

ISBN: (纸本)9798350353006

We propose a lightweight and scalable Regional Point-Language Contrastive learning framework, namely RegionPLC, for open-world 3D scene understanding, aiming to identify and recognize open-set objects and categories. Specifically, based on our empirical studies, we introduce a 3D-aware SFusion strategy that fuses 3D vision-language pairs derived from multiple 2D foundation models, yielding high-quality, dense region-level language descriptions without human 3D annotations. Subsequently, we devise a region-aware point-discriminative contrastive learning objective to enable robust and effective 3D learning from dense regional language supervision. We carry out extensive experiments on ScanNet, ScanNet200, and nuScenes datasets, and our model outperforms prior 3D open-world scene understanding approaches by an average of 17.2% and 9.1% for semantic and instance segmentation, respectively, while maintaining greater scalability and lower resource demands. Furthermore, our method has the flexibility to be effortlessly integrated with language models to enable open-ended grounded 3D reasoning without extra task-specific training. Code will be released at github.

关键词： 3D vision embodied AI Open-world

来源：评论

学校读者我要写书评

暂无评论

Unravelling Robustness of Deep Face recognition Networks Against Illicit Drug Abuse Images

Unravelling Robustness of Deep Face Recognition Networks Aga...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Dhake, Hruturaj Agarwal, Akshay IISER Bhopal Data Sci & Engn Bhopal India

ISBN: (纸本)9798350365474

Alteration in facial features can lead to a significant drop in recognition performance. These alterations can be due to several factors: one such prominent and less explored factor is illicit drug abuse. To advance the understanding of how drug abuse faces affect the performance of state-of-the-art deep face recognition (DFR) networks, in this study, we have utilized clean and illicit drug abuse faces. Extensive studies are performed on deep face recognition and soft biometric identification, such as gender, ethnicity, and expression recognition. It is observed that illicit drug abuse not only impacts the identity recognition performance but also degrades the soft biometrics identification accuracy. Therefore, to advance the integrity of DFR, we have performed the detection of illicit drug abuse as a potential solution to its mitigation. In the end, the robustness of the drug abuse face detector is evaluated under the prominent use of social-media filters on face images.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Equivariant Multi-Modality Image Fusion

Equivariant Multi-Modality Image Fusion

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Zhao, Zixiang Hai, Haowen Zhang, Jiangshe Zhang, Yulun Zhane, Kai Xu, Shuang Chen, Dongdong Timofte, Radu Van Gool, Luc Xi An Jiao Tong Univ Xian Peoples R China Swiss Fed Inst Technol Zurich Switzerland Shanghai Jiao Tong Univ Shanghai Peoples R China Nanjing Univ Nanjing Peoples R China Northwestern Polytech Univ Xian Peoples R China Heriot Watt Univ Edinburgh Midlothian Scotland Univ Wurzburg Wurzburg Germany INSAIT Sofia Bulgaria

ISBN: (纸本)9798350353006

Multi-modality image fusion is a technique that combines information from different sensors or modalities, enabling the fused image to retain complementary features from each modality, such as functional highlights and texture details. However, effective training of such fusion models is challenging due to the scarcity of ground truth fusion data. To tackle this issue, we propose the Equivariant Multi-Modality imAge fusion (EMMA) paradigm for end-to-end self-supervised learning. Our approach is rooted in the prior knowledge that natural imaging responses are equivariant to certain transformations. Consequently, we introduce a novel training paradigm that encompasses a fusion module, a pseudo-sensing module, and an equivariant fusion module. These components enable the net training to follow the principles of the natural sensing-imaging process while satisfying the equivariant imaging prior. Extensive experiments confirm that EMMA yields high-quality fusion results for infraredvisible and medical images, concurrently facilitating downstream multi-modal segmentation and detection tasks. The code is available at https://***/Zhaozixiang1228/MMIF-EMMA.

关键词： image fusion low-level vision

来源：评论

学校读者我要写书评

暂无评论

Sharingan: A Transformer Architecture for Multi-Person Gaze Following

Sharingan: A Transformer Architecture for Multi-Person Gaze ...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Tafasca, Samy Gupta, Anshul Odobez, Jean-Marc Idiap Res Inst Martigny Switzerland Ecole Polytech Fed Lausanne Lausanne Switzerland

ISBN: (纸本)9798350353013;9798350353006

Gaze is a powerful form of non-verbal communication that humans develop from an early age. As such, modeling this behavior is an important task that can benefit a broad set of application domains ranging from robotics to sociology. In particular, the gaze following task in computer vision is defined as the prediction of the 2D pixel coordinates where a person in the image is looking. Previous attempts in this area have primarily centered on CNN-based architectures, but they have been constrained by the need to process one person at a time, which proves to be highly inefficient. In this paper, we introduce a novel and effective multi-person transformer-based architecture for gaze prediction. While there exist prior works using transformers for multi-person gaze prediction [38, 39], they use a fixed set of learnable embeddings to decode both the person and its gaze target, which requires a matching step afterward to link the predictions with the annotations. Thus, it is difficult to quantitatively evaluate these methods reliably with the available benchmarks, or integrate them into a larger human behavior understanding system. Instead, we are the first to propose a multi-person transformer-based architecture that maintains the original task formulation and ensures control over the people fed as input. Our main contribution lies in encoding the person-specific information into a single controlled token to be processed alongside image tokens and using its output for prediction based on a novel multiscale decoding mechanism. Our new architecture achieves state-of-the-art results on the GazeFollow, VideoAttentionTarget, and ChildPlay datasets and outper-forms comparable multi-person architectures with a notable margin. Our code, checkpoints, and data extractions will be made publicly available soon.

关键词： computer vision deep learning gaze following

来源：评论

学校读者我要写书评

暂无评论

EgoThink: Evaluating First-Person Perspective Thinking Capability of vision-Language Models

EgoThink: Evaluating First-Person Perspective Thinking Capab...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Cheng, Sijie Guo, Zhicheng Wu, Jingwen Fang, Kechen Li, Peng Liu, Huaping Liu, Yang Tsinghua Univ Dept Comp Sci & Technol Beijing Peoples R China Tsinghua Univ Inst AI Ind Res AIR Beijing Peoples R China Univ Toronto Dept Elect & Comp Engn Toronto ON Canada Tsinghua Univ Zhili Coll Beijing Peoples R China 01 Ai Beijing Peoples R China

ISBN: (纸本)9798350353006

vision-language models (VLMs) have recently shown promising results in traditional downstream tasks. Evaluation studies have emerged to assess their abilities, with the majority focusing on the third-person perspective, and only a few addressing specific tasks from the first-person perspective. However, the capability of VLMs to "think" from a first-person perspective, a crucial attribute for advancing autonomous agents and robotics, remains largely unexplored. To bridge this research gap, we introduce EgoThink, a novel visual question-answering benchmark that encompasses six core capabilities with twelve detailed dimensions. The benchmark is constructed using selected clips from ego-centric videos, with manually annotated question-answer pairs containing first-person information. To comprehensively assess VLMs, we evaluate twenty-one popular VLMs on EgoThink. Moreover, given the open-ended format of the answers, we use GPT-4 as the automatic judge to compute single-answer grading. Experimental results indicate that although GPT-4V leads in numerous dimensions, all evaluated VLMs still possess considerable potential for improvement in first-person perspective tasks. Meanwhile, enlarging the number of trainable parameters has the most significant impact on model performance on EgoThink. In conclusion, EgoThink serves as a valuable addition to existing evaluation benchmarks for VLMs, providing an indispensable resource for future research in the realm of embodied artificial intelligence and robotics.

关键词： Benchmark Egocentric vision-Language Models

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 30 31 32 33 34 35 36 37 38 39 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：