检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

20,994 篇 会议
99 册 图书
86 篇 期刊文献
1 篇 学位论文

馆藏范围

21,179 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

13,604 篇 工学
- 11,180 篇 计算机科学与技术...
- 2,631 篇 机械工程
- 2,543 篇 软件工程
- 990 篇 光学工程
- 848 篇 电气工程
- 676 篇 控制科学与工程
- 487 篇 信息与通信工程
- 242 篇 仪器科学与技术
- 215 篇 测绘科学与技术
- 159 篇 生物医学工程（可授...
- 150 篇 生物工程
- 139 篇 电子科学与技术（可...
- 69 篇 安全科学与工程
- 67 篇 化学工程与技术
- 55 篇 建筑学
- 53 篇 土木工程
- 43 篇 力学（可授工学、理...
- 41 篇 航空宇航科学与技...
3,462 篇 医学
- 3,452 篇 临床医学
- 41 篇 基础医学(可授医学...
2,484 篇 理学
- 1,248 篇 数学
- 1,213 篇 物理学
- 446 篇 统计学（可授理学、...
- 418 篇 生物学
- 269 篇 系统科学
- 67 篇 化学
424 篇 管理学
- 218 篇 管理科学与工程(可...
- 217 篇 图书情报与档案管...
- 43 篇 工商管理
144 篇 艺术学
- 142 篇 设计学（可授艺术学...
41 篇 法学
31 篇 农学
12 篇 经济学
10 篇 教育学
6 篇 文学
3 篇 军事学

主题

8,072 篇 computer vision
2,880 篇 pattern recognit...
2,859 篇 training
1,808 篇 computational mo...
1,718 篇 visualization
1,477 篇 cameras
1,381 篇 shape
1,374 篇 face recognition
1,364 篇 three-dimensiona...
1,342 篇 feature extracti...
1,269 篇 image segmentati...
1,156 篇 robustness
1,109 篇 semantics
982 篇 layout
977 篇 object detection
953 篇 computer archite...
952 篇 benchmark testin...
931 篇 codes
918 篇 object recogniti...
898 篇 computer science

机构

174 篇 univ sci & techn...
154 篇 carnegie mellon ...
149 篇 univ chinese aca...
144 篇 chinese univ hon...
110 篇 microsoft resear...
104 篇 zhejiang univ pe...
98 篇 swiss fed inst t...
93 篇 tsinghua univ pe...
92 篇 tsinghua univers...
90 篇 microsoft res as...
88 篇 shanghai ai lab ...
83 篇 zhejiang univers...
76 篇 alibaba grp peop...
74 篇 hong kong univ s...
73 篇 university of sc...
72 篇 peking univ peop...
68 篇 shanghai jiao to...
68 篇 university of ch...
66 篇 google res mount...
66 篇 univ oxford oxfo...

作者

83 篇 van gool luc
71 篇 zhang lei
60 篇 timofte radu
49 篇 yang yi
49 篇 luc van gool
48 篇 xiaoou tang
43 篇 darrell trevor
43 篇 tian qi
42 篇 loy chen change
42 篇 sun jian
41 篇 qi tian
37 篇 vasconcelos nuno
37 篇 liu yang
37 篇 chen xilin
37 篇 li fei-fei
36 篇 liu xiaoming
36 篇 shan shiguang
36 篇 li stan z.
36 篇 torralba antonio
33 篇 zhou jie

语言

21,138 篇 英文
31 篇 中文
5 篇 土耳其文
4 篇 其他
2 篇 日文

检索条件"任意字段=2011 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2011"

共 21180 条记录，以下是831-840 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Task Navigator: Decomposing Complex Tasks for Multimodal Large Language Models

Task Navigator: Decomposing Complex Tasks for Multimodal Lar...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Ma, Feipeng Zhou, Yizhou Zhang, Yueyi Wu, Siying Zhang, Zheyu He, Zilong Rao, Fengyun Sun, Xiaoyan Univ Sci & Technol China Hefei Peoples R China Tencent Inc WeChat Shenzhen Peoples R China Hefei Comprehens Natl Sci Ctr Inst Artificial Intelligence Hefei Peoples R China

ISBN: (纸本)9798350365474

Inspired by the remarkable progress achieved by recent Large Language Models (LLMs), Multimodal Large Language Models (MLLMs) take LLMs as their brains, and have achieved surprising results in many downstream tasks by training on a large amount of task-specific data. However, when faced with complex tasks that require the collaboration of multiple capabilities, existing MLLMs recollect training data and retrain the model, ignoring the systematic utilization of LLMs and their possessed capabilities learned in downstream tasks. Inspired by the way humans tackle complex questions, in this paper, we propose a novel framework called Task Navigator. In our framework, LLMs act as navigators to chart a viable path for solving complex tasks and guide MLLMs through the process step by step. Specifically, LLMs iteratively break down sub-problems and refine them to be more reasonable and answerable, which are subsequently resolved by MLLMs to obtain relevant subanswers, until the LLMs have collected enough information to answer the initial question. Task Navigator provides an effective way to extend MLLMs to tackle complex tasks, thus broadening MLLMs' applicability. To evaluate the performance of the proposed framework, we have curated a carefully designed benchmark called VersaChallenge. Experiments on VersaChallenge demonstrate the effectiveness of our proposed method.

关键词： Language and vision Multi-modal vision

来源：评论

学校读者我要写书评

暂无评论

Aligning Bag of Regions for Open-Vocabulary Object Detection

Aligning Bag of Regions for Open-Vocabulary Object Detection

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Wu, Size Zhang, Wenwei Jin, Sheng Liu, Wentao Loy, Chen Change Nanyang Technol Univ S Lab Singapore Singapore Univ Hong Kong Hong Kong Peoples R China SenseTime Res & Tetras AI Shenzhen Peoples R China Shanghai AI Lab Shanghai Peoples R China

ISBN: (纸本)9798350301298

Pre-trained vision-language models (VLMs) learn to align vision and language representations on large-scale datasets, where each image-text pair usually contains a bag of semantic concepts. However, existing open-vocabulary object detectors only align region embeddings individually with the corresponding features extracted from the VLMs. Such a design leaves the compositional structure of semantic concepts in a scene under-exploited, although the structure may be implicitly learned by the VLMs. In this work, we propose to align the embedding of bag of regions beyond individual regions. The proposed method groups contextually interrelated regions as a bag. The embeddings of regions in a bag are treated as embeddings of words in a sentence, and they are sent to the text encoder of a VLM to obtain the bag-of-regions embedding, which is learned to be aligned to the corresponding features extracted by a frozen VLM. Applied to the commonly used Faster R-CNN, our approach surpasses the previous best results by 4.6 box AP50 and 2.8 mask AP on novel categories of open-vocabulary COCO and LVIS benchmarks, respectively. Code and models are available at https://***/wusize/ovdet.

关键词： detection recognition: Categorization retrieval

来源：评论

学校读者我要写书评

暂无评论

Multivariate, Multi-frequency and Multimodal: Rethinking Graph Neural Networks for Emotion recognition in Conversation

Multivariate, Multi-frequency and Multimodal: Rethinking Gra...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Chen, Feiyu Shao, Jie Zhu, Shuyuan Shen, Heng Tao Univ Elect Sci & Technol China Chengdu Peoples R China Sichuan Artificial Intelligence Res Inst Yibin Peoples R China

ISBN: (纸本)9798350301298

Complex relationships of high arity across modality and context dimensions is a critical challenge in the Emotion recognition in Conversation (ERC) task. Yet, previous works tend to encode multimodal and contextual relationships in a loosely-coupled manner, which may harm relationship modelling. Recently, Graph Neural Networks (GNN) which show advantages in capturing data relations, offer a new solution for ERC. However, existing GNN-based ERC models fail to address some general limits of GNNs, including assuming pairwise formulation and erasing high-frequency signals, which may be trivial for many applications but crucial for the ERC task. In this paper, we propose a GNN-based model that explores multivariate relationships and captures the varying importance of emotion discrepancy and commonality by valuing multi-frequency signals. We empower GNNs to better capture the inherent relationships among utterances and deliver more sufficient multimodal and contextual modelling. Experimental results show that our proposed method outperforms previous state-of-the-art works on two popular multimodal ERC datasets.

关键词： Multi-modal learning

来源：评论

学校读者我要写书评

暂无评论

Learning to Render Novel Views from Wide-Baseline Stereo Pairs

Learning to Render Novel Views from Wide-Baseline Stereo Pai...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Du, Yilun Smith, Cameron Tewari, Ayush Sitzmann, Vincent MIT CSAIL Cambridge MA 02139 USA

ISBN: (纸本)9798350301298

We introduce a method for novel view synthesis given only a single wide-baseline stereo image pair. In this challenging regime, 3D scene points are regularly observed only once, requiring prior-based reconstruction of scene geometry and appearance. We find that existing approaches to novel view synthesis from sparse observations fail due to recovering incorrect 3D geometry and due to the high cost of differentiable rendering that precludes their scaling to large-scale training. We take a step towards resolving these shortcomings by formulating a multi-view transformer encoder, proposing an efficient, image-space epipolar line sampling scheme to assemble image features for a target ray, and a lightweight cross-attention-based renderer. Our contributions enable training of our method on a large-scale real-world dataset of indoor and outdoor scenes. We demonstrate that our method learns powerful multi-view geometry priors while reducing the rendering time. We conduct extensive comparisons on held-out test scenes across two real-world datasets, significantly outperforming prior work on novel view synthesis from sparse image observations and achieving multi-view-consistent novel view synthesis.

关键词： vision + graphics

来源：评论

学校读者我要写书评

暂无评论

Universal Guidance for Diffusion Models

Universal Guidance for Diffusion Models

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Bansal, Arpit Chu, Hong-Min Schwarzschild, Avi Sengupta, Soumyadip Goldblum, Micah Geiping, Jonas Goldstein, Tom Univ Maryland College Pk MD 20742 USA Univ North Carolina Chapel Hill Chapel Hill NC USA NYU New York NY USA

ISBN: (纸本)9798350302493

Typical diffusion models are trained to accept a particular form of conditioning, most commonly text, and cannot be conditioned on other modalities without retraining. In this work, we propose a universal guidance algorithm that enables diffusion models to be controlled by arbitrary guidance modalities without the need to retrain any use-specific components. We show that our algorithm successfully generates quality images with guidance functions including segmentation, face recognition, object detection, and classifier signals. Code is available at ***/arpitbansal297/UniversalGuided-Diffusion.

关键词： Face recognition

来源：评论

学校读者我要写书评

暂无评论

Feature Representation Learning with Adaptive Displacement Generation and Transformer Fusion for Micro-Expression recognition

Feature Representation Learning with Adaptive Displacement G...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Zhai, Zhijun Zhao, Jianhui Long, Chengjiang Xu, Wenju He, Shuangjiang Zhao, Huijuan Wuhan Univ Sch Comp Sci Wuhan Hubei Peoples R China Meta Real Labs Burlingame CA USA InnoPeak Technol Inc OPPO US Res Ctr Palo Alto CA USA FiberHome Telecommun Technol Co Ltd Wuhan Hubei Peoples R China

ISBN: (纸本)9798350301298

Micro-expressions are spontaneous, rapid and subtle facial movements that can neither be forged nor suppressed. They are very important nonverbal communication clues, but are transient and of low intensity thus difficult to recognize. Recently deep learning based methods have been developed for micro-expression (ME) recognition using feature extraction and fusion techniques, however, targeted feature learning and efficient feature fusion still lack further study according to the ME characteristics. To address these issues, we propose a novel framework Feature Representation Learning with adaptive Displacement Generation and Transformer fusion (FRL-DGT), in which a convolutional Displacement Generation Module (DGM) with self-supervised learning is used to extract dynamic features from onset/apex frames targeted to the subsequent ME recognition task, and a well-designed Transformer Fusion mechanism composed of three Transformer-based fusion modules (local, global fusions based on AU regions and full-face fusion) is applied to extract the multi-level informative features after DGM for the final ME prediction. The extensive experiments with solid leave-one-subject-out (LOSO) evaluation results have demonstrated the superiority of our proposed FRL-DGT to state-of-the-art methods.

关键词： body gesture Humans: Face movement pose

来源：评论

学校读者我要写书评

暂无评论

ViTA: An Efficient Video-to-Text Algorithm using VLM for RAG-based Video Analysis System

ViTA: An Efficient Video-to-Text Algorithm using VLM for RAG...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Arefeen, Md Adnan Debnath, Biplob Uddin, Md Yusuf Sarwar Chakradhar, Srimat NEC Labs Amer Princeton NJ 08540 USA Univ Missouri Kansas City MO 64110 USA

ISBN: (纸本)9798350365474

Retrieval-augmented generation (RAG) is used in natural language processing (NLP) to provide query-relevant information in enterprise documents to large language models (LLMs). Such enterprise context enables the LLMs to generate more informed and accurate responses. When enterprise data is primarily videos, AI models like vision language models (VLMs) are necessary to convert information in videos into text. While essential, this conversion is a bottleneck, especially for large corpus of videos. It delays the timely use of enterprise videos to generate useful responses. We propose ViTA, a novel method that leverages two unique characteristics of VLMs to expedite the conversion process. As VLMs output more text tokens, they incur higher latency. In addition, large (heavyweight) VLMs can extract intricate details from images and videos, but they incur much higher latency per output token when compared to smaller (lightweight) VLMs that may miss details. To expedite conversion, ViTA first employs a lightweight VLM to quickly understand the gist or overview of an image or a video clip, and directs a heavyweight VLM (through prompt engineering) to extract additional details by using only a few (preset number of) output tokens. Our experimental results show that ViTA expedites the conversion time by as much as 43%, without compromising the accuracy of responses when compared to a baseline system that only uses a heavyweight VLM.

关键词： Large Language Models (LLMs) Natural Language Processing Retrieval Augmented Generation (RAG) Video Analytics vision Language Models (VLMs)

来源：评论

学校读者我要写书评

暂无评论

Temporal Attention Unit: Towards Efficient Spatiotemporal Predictive Learning

Temporal Attention Unit: Towards Efficient Spatiotemporal Pr...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Tan, Cheng Gao, Zhangyang Wu, Lirong Xu, Yongjie Xia, Jun Li, Siyuan Li, Stan Z. Westlake Univ AI Lab Res Ctr Ind Future Hangzhou Peoples R China

ISBN: (纸本)9798350301298

Spatiotemporal predictive learning aims to generate future frames by learning from historical frames. In this paper, we investigate existing methods and present a general framework of spatiotemporal predictive learning, in which the spatial encoder and decoder capture intra-frame features and the middle temporal module catches inter-frame correlations. While the mainstream methods employ recurrent units to capture long-term temporal dependencies, they suffer from low computational efficiency due to their unparallelizable architectures. To parallelize the temporal module, we propose the Temporal Attention Unit (TAU), which decomposes temporal attention into intra-frame statical attention and inter-frame dynamical attention. Moreover, while the mean squared error loss focuses on intra-frame errors, we introduce a novel differential divergence regularization to take inter-frame variations into account. Extensive experiments demonstrate that the proposed method enables the derived model to achieve competitive performance on various spatiotemporal prediction benchmarks.

关键词： Low-level vision

来源：评论

学校读者我要写书评

暂无评论

GeoVLN: Learning Geometry-Enhanced Visual Representation with Slot Attention for vision-and-Language Navigation

GeoVLN: Learning Geometry-Enhanced Visual Representation wit...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Huo, Jingyang Sun, Qiang Jiang, Boyan Lin, Haitao Fu, Yanwei Fudan Univ Shanghai Peoples R China

ISBN: (纸本)9798350301298

Most existing works solving Room-to-Room VLN problem only utilize RGB images and do not consider local context around candidate views, which lack sufficient visual cues about surrounding environment. Moreover, natural language contains complex semantic information thus its correlations with visual inputs are hard to model merely with cross attention. In this paper, we propose GeoVLN, which learns Geometry-enhanced visual representation based on slot attention for robust Visual-and-Language Navigation. The RGB images are compensated with the corresponding depth maps and normal maps predicted by Omnidata as visual inputs. Technically, we introduce a two-stage module that combine local slot attention and CLIP model to produce geometry-enhanced representation from such input. We employ V&L BERT to learn a cross-modal representation that incorporate both language and vision informations. Additionally, a novel multiway attention module is designed, encouraging different phrases of input instruction to exploit the most related features from visual input. Extensive experiments demonstrate the effectiveness of our newly designed modules and show the compelling performance of the proposed method.

关键词： Multi-modal learning

来源：评论

学校读者我要写书评

暂无评论

D²Former: Jointly Learning Hierarchical Detectors and Contextual Descriptors via Agent-based Transformers

D<SUP>2</SUP>Former: Jointly Learning Hierarchical Detectors...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： He, Jianfeng Gao, Yuan Zhang, Tianzhu Zhang, Zhe Wu, Feng Univ Sci & Technol China Hefei Peoples R China Deep Space Explorat Lab Hefei Peoples R China

ISBN: (纸本)9798350301298

Establishing pixel-level matches between image pairs is vital for a variety of computer vision applications. However, achieving robust image matching remains challenging because CNN extracted descriptors usually lack discriminative ability in texture-less regions and keypoint detectors are only good at identifying keypoints with a specific level of structure. To deal with these issues, a novel image matching method is proposed by Jointly Learning Hierarchical Detectors and Contextual Descriptors via Agentbased Transformers (D2Former), including a contextual feature descriptor learning (CFDL) module and a hierarchical keypoint detector learning (HKDL) module. The proposed D2Former enjoys several merits. First, the proposed CFDL module can model long-range contexts efficiently and effectively with the aid of designed descriptor agents. Second, the HKDL module can generate keypoint detectors in a hierarchical way, which is helpful for detecting keypoints with diverse levels of structures. Extensive experimental results on four challenging benchmarks show that our proposed method significantly outperforms state-of-the-art image matching methods.

关键词： detection recognition: Categorization retrieval

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 80 81 82 83 84 85 86 87 88 89 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：