检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

20,994 篇 会议
99 册 图书
86 篇 期刊文献
1 篇 学位论文

馆藏范围

21,179 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

13,604 篇 工学
- 11,180 篇 计算机科学与技术...
- 2,631 篇 机械工程
- 2,543 篇 软件工程
- 990 篇 光学工程
- 848 篇 电气工程
- 676 篇 控制科学与工程
- 487 篇 信息与通信工程
- 242 篇 仪器科学与技术
- 215 篇 测绘科学与技术
- 159 篇 生物医学工程（可授...
- 150 篇 生物工程
- 139 篇 电子科学与技术（可...
- 69 篇 安全科学与工程
- 67 篇 化学工程与技术
- 55 篇 建筑学
- 53 篇 土木工程
- 43 篇 力学（可授工学、理...
- 41 篇 航空宇航科学与技...
3,462 篇 医学
- 3,452 篇 临床医学
- 41 篇 基础医学(可授医学...
2,484 篇 理学
- 1,248 篇 数学
- 1,213 篇 物理学
- 446 篇 统计学（可授理学、...
- 418 篇 生物学
- 269 篇 系统科学
- 67 篇 化学
424 篇 管理学
- 218 篇 管理科学与工程(可...
- 217 篇 图书情报与档案管...
- 43 篇 工商管理
144 篇 艺术学
- 142 篇 设计学（可授艺术学...
41 篇 法学
31 篇 农学
12 篇 经济学
10 篇 教育学
6 篇 文学
3 篇 军事学

主题

8,072 篇 computer vision
2,880 篇 pattern recognit...
2,859 篇 training
1,808 篇 computational mo...
1,718 篇 visualization
1,477 篇 cameras
1,381 篇 shape
1,374 篇 face recognition
1,364 篇 three-dimensiona...
1,342 篇 feature extracti...
1,269 篇 image segmentati...
1,156 篇 robustness
1,109 篇 semantics
982 篇 layout
977 篇 object detection
953 篇 computer archite...
952 篇 benchmark testin...
931 篇 codes
918 篇 object recogniti...
898 篇 computer science

机构

174 篇 univ sci & techn...
154 篇 carnegie mellon ...
149 篇 univ chinese aca...
144 篇 chinese univ hon...
110 篇 microsoft resear...
104 篇 zhejiang univ pe...
98 篇 swiss fed inst t...
93 篇 tsinghua univ pe...
92 篇 tsinghua univers...
90 篇 microsoft res as...
88 篇 shanghai ai lab ...
83 篇 zhejiang univers...
76 篇 alibaba grp peop...
74 篇 hong kong univ s...
73 篇 university of sc...
72 篇 peking univ peop...
68 篇 shanghai jiao to...
68 篇 university of ch...
66 篇 google res mount...
66 篇 univ oxford oxfo...

作者

83 篇 van gool luc
71 篇 zhang lei
60 篇 timofte radu
49 篇 yang yi
49 篇 luc van gool
48 篇 xiaoou tang
43 篇 darrell trevor
43 篇 tian qi
42 篇 loy chen change
42 篇 sun jian
41 篇 qi tian
37 篇 vasconcelos nuno
37 篇 liu yang
37 篇 chen xilin
37 篇 li fei-fei
36 篇 liu xiaoming
36 篇 shan shiguang
36 篇 li stan z.
36 篇 torralba antonio
33 篇 zhou jie

语言

21,138 篇 英文
31 篇 中文
5 篇 土耳其文
4 篇 其他
2 篇 日文

检索条件"任意字段=2011 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2011"

共 21180 条记录，以下是521-530 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

How you feelin'? Learning Emotions and Mental States in Movie Scenes

How you feelin'? Learning Emotions and Mental States in Movi...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Srivastava, Dhruv Singh, Aditya Kumar Tapaswi, Makarand IIIT Hyderabad CVIT Hyderabad Telangana India

ISBN: (纸本)9798350301298

Movie story analysis requires understanding characters' emotions and mental states. Towards this goal, we formulate emotion understanding as predicting a diverse and multi-label set of emotions at the level of a movie scene and for each character. We propose EmoTx, a multimodal Transformer-based architecture that ingests videos, multiple characters, and dialog utterances to make joint predictions. By leveraging annotations from the MovieGraphs dataset [72], we aim to predict classic emotions (e.g. happy, angry) and other mental states (e.g. honest, helpful). We conduct experiments on the most frequently occurring 10 and 25 labels, and a mapping that clusters 181 labels to 26. Ablation studies and comparison against adapted state-of-the-art emotion recognition approaches shows the effectiveness of EmoTx. Analyzing EmoTx's self-attention scores reveals that expressive emotions often look at character tokens while other mental states rely on video and dialog cues.

关键词： language reasoning vision

来源：评论

学校读者我要写书评

暂无评论

Learning Bottleneck Concepts in Image Classification

Learning Bottleneck Concepts in Image Classification

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Wang, Bowen Li, Liangzhi Nakashima, Lizyuta Nagahara, Hajime Osaka Univ Osaka Japan

ISBN: (纸本)9798350301298

Interpreting and explaining the behavior of deep neural networks is critical for many tasks. Explainable AI provides a way to address this challenge, mostly by providing per-pixel relevance to the decision. Yet, interpreting such explanations may require expert knowledge. Some recent attempts toward interpretability adopt a concept-based framework, giving a higher-level relationship between some concepts and model decisions. This paper proposes Bottleneck Concept Learner (BotCL), which represents an image solely by the presence/absence of concepts learned through training over the target task without explicit supervision over the concepts. It uses self-supervision and tailored regularizers so that learned concepts can be human-understandable. Using some image classification tasks as our testbed, we demonstrate BotCL's potential to rebuild neural networks for better interpretability.

关键词： Explainable computer vision

来源：评论

学校读者我要写书评

暂无评论

An End-to-End Approach for Handwriting recognition: From Handwritten Text Lines to Complete Pages

An End-to-End Approach for Handwriting Recognition: From Han...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Castro, Dayvid Bezerra, Byron Leite Dantas Zanchettin, Cleber Univ Fed Pernambuco Recife PE Brazil

ISBN: (纸本)9798350365474

Handwritten Document recognition (HDR) has emerged as a challenging task integrating text and layout information recognition to tackle manuscripts end-to-end. Despite advancements, the computational efficiency of processing entire documents remains a critical challenge, limiting the practical applicability of these models. This paper presents the Document Attention Network for Computationally Efficient recognition (DANCER). The model differs from existing approaches with its unique encoder-decoder structure, where the encoder reduces spatial redundancy and enhances spatial attention, and the decoder, comprising transformer layers, efficiently decodes the text using optimized attention operations. This design results in a fast, memory-efficient model capable of effectively transcribing and understanding complex manuscript layouts. We evaluated DANCER's efficacy on the ICFHR 2016 READ competition dataset, focusing on recognizing single and doublepage historical documents. We demonstrate how DANCER can triple the training batch size compared to prior models within the same memory limits and reduce memory usage by up to 65% without compromising recognition quality. The proposed approach sets new standards in efficiency and accuracy for HDR solutions, paving the way for practical and scalable applications in diverse contexts.

关键词： Attention Networks Full-Page recognition Handwriting recognition Handwritten Document recognition Transformer Network

来源：评论

学校读者我要写书评

暂无评论

LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning

LL3DA: Visual Interactive Instruction Tuning for Omni-3D Und...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Chen, Sijin Chen, Xin Zhang, Chi Li, Mingsheng Yu, Gang Fei, Hao Zhu, Hongyuan Fan, Jiayuan Chen, Tao Fudan Univ Shanghai Peoples R China Tencent PCG Shenzhen Peoples R China Natl Univ Singapore Singapore Singapore ASTAR Inst InfoComm Res I2R Singapore Singapore ASTAR Ctr Frontier AI Res CFAR Singapore Singapore

ISBN: (纸本)9798350353006

Recent progress in Large Multimodal Models (LMM) has opened up great possibilities for various applications in the field of human-machine interactions. However, developing LMMs that can comprehend, reason, and plan in complex and diverse 3D environments remains a challenging topic, especially considering the demand for understanding permutation-invariant point cloud representations of the 3D scene. Existing works seek help from multi-view images by projecting 2D features to 3D space, which inevitably leads to huge computational overhead and performance degradation. In this paper, we present LL3DA, a Large Language 3D Assistant that takes point cloud as the direct input and responds to both text instructions and visual interactions. The additional visual interaction enables LMMs to better comprehend human interactions with the 3D environment and further remove the ambiguities within plain texts. Experiments show that LL3DA achieves remarkable results and surpasses various 3D vision-language models on both 3D Dense Captioning and 3D Question Answering.

关键词： large language models Multi-modal learning vision and language

来源：评论

学校读者我要写书评

暂无评论

Adversarial Counterfactual Visual Explanations

Adversarial Counterfactual Visual Explanations

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Jeanneret, Guillaume Simon, Loic Jurie, Frederic Univ Caen Normandie ENSICAEN CNRS Caen France

ISBN: (纸本)9798350301298

Counterfactual explanations and adversarial attacks have a related goal: flipping output labels with minimal perturbations regardless of their characteristics. Yet, adversarial attacks cannot be used directly in a counterfactual explanation perspective, as such perturbations are perceived as noise and not as actionable and understandable image modifications. Building on the robust learning literature, this paper proposes an elegant method to turn adversarial attacks into semantically meaningful perturbations, without modifying the classifiers to explain. The proposed approach hypothesizes that Denoising Diffusion Probabilistic Models are excellent regularizers for avoiding high-frequency and out-of-distribution perturbations when generating adversarial attacks. The paper's key idea is to build attacks through a diffusion model to polish them. This allows studying the target model regardless of its robustification level. Extensive experimentation shows the advantages of our counterfactual explanation approach over current State-of-the-Art in multiple testbeds.

关键词： Explainable computer vision

来源：评论

学校读者我要写书评

暂无评论

ANIM: Accurate Neural Implicit Model for Human Reconstruction from a single RGB-D image

ANIM: Accurate Neural Implicit Model for Human Reconstructio...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Pesavento, Marco Xu, Yuanlu Sarafianos, Nikolaos Maier, Robert Wang, Ziyan Yao, Chun-Han Volino, Marco Boyer, Edmond Hilton, Adrian Tung, Tony Univ Surrey CVSSP Guildford Surrey England UC Merced Merced CA USA Meta Real Labs Sausalito CA USA

ISBN: (纸本)9798350353013;9798350353006

Recent progress in human shape learning, shows that neural implicit models are effective in generating 3D human surfaces from limited number of views, and even from a single RGB image. However, existing monocular approaches still struggle to recover fine geometric details such as face, hands or cloth wrinkles. They are also easily prone to depth ambiguities that result in distorted geometries along the camera optical axis. In this paper, we explore the benefits of incorporating depth observations in the reconstruction process by introducing ANIM, a novel method that reconstructs arbitrary 3D human shapes from single-view RGB-D images with an unprecedented level of accuracy. Our model learns geometric details from both multi-resolution pixel-aligned and voxel-aligned features to leverage depth information and enable spatial relationships, mitigating depth ambiguities. We further enhance the quality of the reconstructed shape by introducing a depth-supervision strategy, which improves the accuracy of the signed distance field estimation of points that lie on the re-constructed surface. Experiments demonstrate that ANIM outperforms state-of-the-art works that use RGB, surface normals, point cloud or RGB-D data as input. In addi-tion, we introduce ANIM-Real, a new multi-modal dataset comprising high-quality scans paired with consumer-grade RGB-D camera, and our protocol to fine-tune ANIM, enabling high-quality reconstruction from real-world human capture. https://***/ANIM/

关键词： 3D Digital Avatars 3D Human reconstruction computer vision Neural Implicit Model

来源：评论

学校读者我要写书评

暂无评论

Video Representation Learning for Conversational Facial Expression recognition Guided by Multiple View Reconstruction

Video Representation Learning for Conversational Facial Expr...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Strizhkova, Valeriya Ferrari, Laura M. Kachmar, Hadi Dantcheva, Antitza Bremond, Francois INRIA Paris France Univ Cote dAzur Nice France

ISBN: (纸本)9798350365474

Conversational facial expression recognition entails challenges such as handling of facial dynamics, small available datasets, low-intensity and fine-grained emotional expressions and extreme face angle. Towards addressing these challenges, we propose the Masking Action Units and Reconstructing multiple Angles (MAURA) pre-training. MAURA is an efficient self-supervised method that permits the use of small datasets, while preserving end-toend conversational facial expression recognition with vision Transformer. MAURA masks videos using the location with active Action Units and reconstructs synchronized multi-view videos, thus learning the dependencies between muscle movements and encoding information, which might only be visible in few frames and/or in certain views. Based on one view (e.g., frontal), the encoder reconstructs other views (e.g., top, down, laterals). Such masking and reconstructing strategy provides a powerful representation, beneficial in facial expression downstream tasks. Our experimental analysis shows that we consistently outperform the state-of-the-art in the challenging settings of low-intensity and fine-grained conversational facial expression recognition on four datasets including in-the-wild DFEW, CMU-MOSEI, MFA and multi-view MEAD. Our results suggest that MAURA is able to learn robust and generic video representations.

关键词： Signal encoding

来源：评论

学校读者我要写书评

暂无评论

Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer

Multi-modal In-Context Learning Makes an Ego-evolving Scene ...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Zhao, Zhen Tang, Jingqun Lin, Chunhui Wu, Binghong Huang, Can Liu, Hao Tan, Xin Zhang, Zhizhong Xie, Yuan East China Normal Univ Shanghai Peoples R China ByteDance Beijing Peoples R China

ISBN: (纸本)9798350353006

Scene text recognition (STR) in the wild frequently encounters challenges when coping with domain variations, font diversity, shape deformations, etc. A straight-forward solution is performing model fine-tuning tailored to a specific scenario, but it is computationally intensive and requires multiple model copies for various scenarios. Recent studies indicate that large language models (LLMs) can learn from a few demonstration examples in a training-free manner, termed "In-Context Learning" (ICL). Nevertheless, applying LLMs as a text recognizer is unacceptably resource-consuming. Moreover, our pilot experiments on LLMs show that ICL fails in STR, mainly attributed to the insufficient incorporation of contextual information from diverse samples in the training stage. To this end, we introduce E2STR, a STR model trained with context-rich scene text sequences, where the sequences are generated via our proposed in-context training strategy. E2STR demonstrates that a regular-sized model is sufficient to achieve effective ICL capabilities in STR. Extensive experiments show that E2STR exhibits remarkable training-free adaptation in various scenarios and outperforms even the fine-tuned state-of-the-art approaches on public benchmarks. The code is released at https://***/bytedance/E2STR.

关键词： in-context learning multi-modal learning text recognition

来源：评论

学校读者我要写书评

暂无评论

Test-Time Zero-Shot Temporal Action Localization

Test-Time Zero-Shot Temporal Action Localization

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Liberatori, Benedetta Conti, Alessandro Rota, Paolo Wang, Yiming Ricci, Elisa Univ Trento Trento Italy Fdn Bruno Kessler Trento Italy

ISBN: (纸本)9798350353006

Zero-Shot Temporal Action Localization (ZS-TAL) seeks to identify and locate actions in untrimmed videos unseen during training. Existing ZS-TAL methods involve fine-tuning a model on a large amount of annotated training data. While effective, training-based ZS-TAL approaches assume the availability of labeled data for supervised learning, which can be impractical in some applications. Furthermore, the training process naturally induces a domain bias into the learned model, which may adversely affect the model's generalization ability to arbitrary videos. These considerations prompt us to approach the ZS-TAL problem from a radically novel perspective, relaxing the requirement for training data. To this aim, we introduce a novel method that performs Test-Time adaptation for Temporal Action Localization (T3AL). In a nutshell, T3AL adapts a pre-trained vision and Language Model (VLM). T3AL operates in three steps. First, a video-level pseudo-label of the action category is computed by aggregating information from the entire video. Then, action localization is performed adopting a novel procedure inspired by self-supervised learning. Finally, frame-level textual descriptions extracted with a state-of-the-art captioning model are employed for refining the action region proposals. We validate the effectiveness of T3AL by conducting experiments on the THUMOS14 and the ActivityNet-v1.3 datasets. Our results demonstrate that T3AL significantly outperforms zero-shot baselines based on state-of-the-art VLMs, confirming the benefit of a test-time adaptation approach.

关键词： temporal action localization vision and language

来源：评论

学校读者我要写书评

暂无评论

Connecting vision and Language with Video Localized Narratives

Connecting Vision and Language with Video Localized Narrativ...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Voigtlaender, Paul Changpinyo, Soravit Pont-Tuset, Jordi Soricut, Radu Ferrari, Vittorio Google Res Mountain View CA 94043 USA

ISBN: (纸本)9798350301298

We propose Video Localized Narratives, a new form of multimodal video annotations connecting vision and language. In the original Localized Narratives [36], annotators speak and move their mouse simultaneously on an image, thus grounding each word with a mouse trace segment. However, this is challenging on a video. Our new protocol empowers annotators to tell the story of a video with Localized Narratives, capturing even complex events involving multiple actors interacting with each other and with several passive objects. We annotated 20k videos of the OVIS, UVO, and Oops datasets, totalling 1.7M words. Based on this data, we also construct new benchmarks for the video narrative grounding and video question answering tasks, and provide reference results from strong baseline models. Our annotations are available at https://***/video-localized-narratives/.

关键词： Datasets and evaluation

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 49 50 51 52 53 54 55 56 57 58 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：