检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

6,506 篇 会议
29 篇 期刊文献
10 册 图书

馆藏范围

6,544 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

3,906 篇 工学
- 3,683 篇 计算机科学与技术...
- 1,504 篇 软件工程
- 802 篇 光学工程
- 359 篇 信息与通信工程
- 256 篇 控制科学与工程
- 204 篇 机械工程
- 176 篇 电气工程
- 95 篇 生物医学工程（可授...
- 79 篇 生物工程
- 77 篇 电子科学与技术（可...
- 61 篇 仪器科学与技术
- 38 篇 建筑学
- 36 篇 力学（可授工学、理...
- 34 篇 土木工程
- 31 篇 航空宇航科学与技...
- 31 篇 安全科学与工程
- 22 篇 材料科学与工程（可...
- 22 篇 交通运输工程
1,537 篇 理学
- 980 篇 物理学
- 960 篇 数学
- 376 篇 统计学（可授理学、...
- 152 篇 生物学
- 34 篇 系统科学
- 24 篇 化学
168 篇 管理学
- 120 篇 图书情报与档案管...
- 50 篇 管理科学与工程(可...
- 33 篇 工商管理
139 篇 医学
- 138 篇 临床医学
- 21 篇 基础医学(可授医学...
20 篇 法学
- 19 篇 社会学
11 篇 农学
10 篇 教育学
7 篇 经济学
3 篇 军事学
3 篇 艺术学

主题

2,304 篇 computer vision
871 篇 pattern recognit...
640 篇 cameras
634 篇 computer science
577 篇 face recognition
553 篇 layout
523 篇 image segmentati...
509 篇 conferences
503 篇 shape
450 篇 robustness
438 篇 object recogniti...
390 篇 humans
341 篇 feature extracti...
318 篇 training
307 篇 object detection
261 篇 application soft...
259 篇 image recognitio...
250 篇 lighting
239 篇 image reconstruc...
237 篇 computational mo...

机构

44 篇 microsoft resear...
26 篇 department of co...
21 篇 swiss fed inst t...
21 篇 school of comput...
20 篇 department of co...
19 篇 swiss fed inst t...
19 篇 carnegie mellon ...
18 篇 department of co...
17 篇 department of in...
17 篇 the robotics ins...
17 篇 institute of com...
16 篇 univ sci & techn...
16 篇 robotics institu...
15 篇 tsinghua univ pe...
14 篇 department of el...
14 篇 school of comput...
14 篇 school of comput...
13 篇 univ maryland co...
13 篇 microsoft resear...
13 篇 microsoft resear...

作者

39 篇 timofte radu
28 篇 s.k. nayar
25 篇 xiaoou tang
25 篇 huang thomas s.
22 篇 t. kanade
20 篇 t.s. huang
19 篇 van gool luc
19 篇 t. darrell
19 篇 chellappa rama
18 篇 nayar shree k.
17 篇 a.k. jain
17 篇 a. zisserman
17 篇 jain anil k.
17 篇 zisserman andrew
16 篇 g. healey
16 篇 torralba antonio
16 篇 heung-yeung shum
16 篇 l. van gool
15 篇 zhang lei
15 篇 li stan z.

语言

6,544 篇 英文
1 篇 其他

检索条件"任意字段=2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005"

共 6545 条记录，以下是101-110 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

AIGeN: An Adversarial Approach for Instruction Generation in VLN

AIGeN: An Adversarial Approach for Instruction Generation in...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Rawal, Niyati Bigazzi, Roberto Baraldi, Lorenzo Cucchiara, Rita Univ Modena & Reggio Emilia Modena Italy

ISBN: (纸本)9798350365474

In the last few years, the research interest in vision-and-Language Navigation (VLN) has grown significantly. VLN is a challenging task that involves an agent following human instructions and navigating in a previously unknown environment to reach a specified goal. Recent work in literature focuses on different ways to augment the available datasets of instructions for improving navigation performance by exploiting synthetic training data. In this work, we propose AIGeN, a novel architecture inspired by Generative Adversarial Networks (GANs) that produces meaningful and well-formed synthetic instructions to improve navigation agents' performance. The model is composed of a Transformer decoder (GPT-2) and a Transformer encoder (BERT). During the training phase, the decoder generates sentences for a sequence of images describing the agent's path to a particular point while the encoder discriminates between real and fake instructions. Experimentally, we evaluate the quality of the generated instructions and perform extensive ablation studies. Additionally, we generate synthetic instructions for 217K trajectories using AIGeN on Habitat-Matterport 3D Dataset (HM3D) and show an improvement in the performance of an off-the-shelf VLN method. The validation analysis of our proposal is conducted on REVERIE and R2R and highlights the promising aspects of our proposal, achieving state-of-the-art performance.

关键词： Generative Adversarial Networks Text Generation vision-and-Language Navigation

来源：评论

学校读者我要写书评

暂无评论

Exploring the Benefits of vision Foundation Models for Unsupervised Domain Adaptation

Exploring the Benefits of Vision Foundation Models for Unsup...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Englert, Bruno B. Piva, Fabrizio J. Kerssies, Tommie de Geus, Daan Dubbelman, Gijs Eindhoven Univ Technol Eindhoven Netherlands

ISBN: (纸本)9798350365474

Achieving robust generalization across diverse data domains remains a significant challenge in computer vision. This challenge is important in safety-critical applications, where deep-neural-network-based systems must perform reliably under various environmental conditions not seen during training. Our study investigates whether the generalization capabilities of vision Foundation Models (VFMs) and Unsupervised Domain Adaptation (UDA) methods for the semantic segmentation task are complementary. Results show that combining VFMs with UDA has two main benefits: (a) it allows for better UDA performance while maintaining the out-of-distribution performance of VFMs, and (b) it makes certain time-consuming UDA components redundant, thus enabling significant inference speedups. Specifically, with equivalent model sizes, the resulting VFM-UDA method achieves an 8.4x speed increase over the prior non-VFM state of the art, while also improving performance by +1.2 mIoU in the UDA setting and by +6.1 mIoU in terms of out-of-distribution generalization. Moreover, when we use a VFM with 3.6x more parameters, the VFM-UDA approach maintains a 3.3x speed up, while improving the UDA performance by +3.1 mIoU and the out-of-distribution performance by +10.3 mIoU. These results underscore the significant benefits of combining VFMs with UDA, setting new standards and baselines for Unsupervised Domain Adaptation in semantic segmentation. The implementation is available at https://***/tue-mps/vfmuda.

关键词： foundation model generalization semantic segmentation unsupervised domain adaptation vision foundation model

来源：评论

学校读者我要写书评

暂无评论

CAGE: Circumplex Affect Guided Expression Inference

CAGE: Circumplex Affect Guided Expression Inference

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Wagner, Niklas Maetzler, Felix Vossberg, Samed R. Schneider, Helen Pavlitska, Svetlana Zoellner, J. Marius Karlsruhe Inst Technol KIT Karlsruhe Germany FZI Res Ctr Informat Technol Karlsruhe Germany

ISBN: (纸本)9798350365474

Understanding emotions and expressions is a task of interest across multiple disciplines, especially for improving user experiences. Contrary to the common perception, it has been shown that emotions are not discrete entities but instead exist along a continuum. People understand discrete emotions differently due to a variety of factors, including cultural background, individual experiences, and cognitive biases. Therefore, most approaches to expression understanding, particularly those relying on discrete categories, are inherently biased. In this paper, we present a comparative in-depth analysis of two common datasets (AffectNet and EMOTIC) equipped with the components of the circumplex model of affect. Further, we propose a model for the prediction of facial expressions tailored for lightweight applications. Using a small-scaled MaxViT-based model architecture, we evaluate the impact of discrete expression category labels in training with the continuous valence and arousal labels. We show that considering valence and arousal in addition to discrete category labels helps to significantly improve expression inference. The proposed model outperforms the current state-of-the-art models on AffectNet, establishing it as the best-performing model for inferring valence and arousal achieving a 7% lower RMSE. Training scripts and trained weights to reproduce our results can be found here: https:// ***/wagner-niklas/CAGE_expression_inference.

关键词： computer vision Expression Inference Transformer

来源：评论

学校读者我要写书评

暂无评论

NTIRE 2024 Challenge on Stereo Image Super-Resolution: Methods and Results

NTIRE 2024 Challenge on Stereo Image Super-Resolution: Metho...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Wang, Longguang Guo, Yulan Li, Juncheng Liu, Hongda Zhao, Yang Wang, Yingqian Jin, Zhi Gu, Shuhang Timofte, Radu Aviation University of Air Force Sun Yat-sen University The Shenzhen Campus of Sun Yat-sen University China National University of Defense Technology China Shanghai University China University of Electronic Science and Technology of China China Computer Vision Lab University of Würzburg Germany

ISBN: (纸本)9798350365474

This paper summarizes the 3rd NTIRE challenge on stereo image super-resolution (SR) with a focus on new solutions and results. The task of this challenge is to super-resolve a low-resolution stereo image pair to a high-resolution one with a magnification factor of x4 under a limited computational budget. Compared with single image SR, the major challenge of this challenge lies in how to exploit additional information in another viewpoint and how to maintain stereo consistency in the results. This challenge has 2 tracks, including one track on bicubic degradation and one track on real degradations. In total, 108 and 70 participants were successfully registered for each track, respectively. In the test phase, 14 and 13 teams successfully submitted valid results with PSNR (RGB) scores better than the baseline. This challenge establishes a new benchmark for stereo image SR.

关键词： Stereocenters

来源：评论

学校读者我要写书评

暂无评论

ELSA: Exploiting Layer-wise N:M Sparsity for vision Transformer Acceleration

ELSA: Exploiting Layer-wise N:M Sparsity for Vision Transfor...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Huang, Ning-Chi Chang, Chi-Chih Lin, Wei-Cheng Taka, Endri Marculescu, Diana Wu, Kai-Chiang Natl Yang Ming Chiao Tung Univ Hsinchu Taiwan Univ Texas Austin Austin TX USA

ISBN: (纸本)9798350365474

N:M sparsity is an emerging model compression method supported by more and more accelerators to speed up sparse matrix multiplication in deep neural networks. Most existing N:M sparsity methods compress neural networks with a uniform setting for all layers in a network or heuristically determine the layer-wise configuration by considering the number of parameters in each layer. However, very few methods have been designed for obtaining a layer-wise customized N:M sparse configuration for vision transformers (ViTs), which usually consist of transformer blocks involving the same number of parameters. In this work, to address the challenge of selecting suitable sparse configuration for ViTs on N:M sparsity-supporting accelerators, we propose ELSA, Exploiting Layer-wise N:M Sparsity for ViTs. Considering not only all N:M sparsity levels supported by a given accelerator but also the expected throughput improvement, our methodology can reap the benefits of accelerators supporting mixed sparsity by trading off negligible accuracy loss with both memory usage and inference time reduction for ViT models. For instance, our approach achieves a noteworthy 2.9x reduction in FLOPs to both Swin-B and DeiT-B with only a marginal degradation of accuracy on ImageNet. Our code is publicly available at https://***/ningchihuang/ ELSA.

关键词： Deep neural networks

来源：评论

学校读者我要写书评

暂无评论

Exploring the Zero-Shot Capabilities of vision-Language Models for Improving Gaze Following

Exploring the Zero-Shot Capabilities of Vision-Language Mode...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Gupta, Anshul Vuillecard, Pierre Farkhondeh, Arya Odobez, Jean-Marc Idiap Res Inst Martigny Switzerland Ecole Polytech Fed Lausanne Lausanne Switzerland

ISBN: (纸本)9798350365474

Contextual cues related to a person's pose and interactions with objects and other people in the scene can provide valuable information for gaze following. While existing methods have focused on dedicated cue extraction methods, in this work we investigate the zero-shot capabilities of vision-Language Models (VLMs) for extracting a wide array of contextual cues to improve gaze following performance. We first evaluate various VLMs, prompting strategies, and in-context learning (ICL) techniques for zero-shot cue recognition performance. We then use these insights to extract contextual cues for gaze following, and investigate their impact when incorporated into a state of the art model for the task. Our analysis indicates that BLIP-2 is the overall top performing VLM and that ICL can improve performance. We also observe that VLMs are sensitive to the choice of the text prompt although ensembling over multiple text prompts can provide more robust performance. Additionally, we discover that using the entire image along with an ellipse drawn around the target person is the most effective strategy for visual prompting. For gaze following, incorporating the extracted cues results in better generalization performance, especially when considering a larger set of cues, highlighting the potential of this approach.

关键词： Gaze Following vision-Language Zero-Shot Evaluation

来源：评论

学校读者我要写书评

暂无评论

Multimodal Attack Detection for Action recognition Models

Multimodal Attack Detection for Action Recognition Models

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Mumcu, Furkan Yilmaz, Yasin Univ S Florida 4202 E Fowler Ave Tampa FL 33620 USA

ISBN: (纸本)9798350365474

Adversarial machine learning attacks on video action recognition models is a growing research area and many effective attacks were introduced in recent years. These attacks show that action recognition models can be breached in many ways. Hence using these models in practice raises significant security concerns. However, there are very few works which focus on defending against or detecting attacks. In this work, we propose a novel universal detection method which is compatible with any action recognition model. In our extensive experiments, we show that our method consistently detects various attacks against different target models with high true positive rates while satisfying very low false positive rates. Tested against four state-of-the-art attacks targeting four action recognition models, the proposed detector achieves an average AUC of 0.911 over 16 test cases while the best performance achieved by the existing detectors is 0.645 average AUC. This 41.2% improvement is enabled by the robustness of the proposed detector to varying attack methods and target models. The lowest AUC achieved by our detector across the 16 test cases is 0.837 while the competing detector's performance drops as low as 0.211. We also show that the proposed detector is robust to varying attack strengths. In addition, we analyze our method's real-time performance with different hardware setups to demonstrate its potential as a practical defense mechanism.

关键词： Action recognition Models Adversarial machine learning attacks Attack detection

来源：评论

学校读者我要写书评

暂无评论

DVMSR: Distillated vision Mamba for Efficient Super-Resolution

DVMSR: Distillated Vision Mamba for Efficient Super-Resoluti...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Lei, Xiaoyan Zhang, Wenlong Cao, Weifeng Zhengzhou Univ Light Ind Zhengzhou Peoples R China HongKong Polytech Univ Hong Kong Peoples R China

ISBN: (纸本)9798350365474

Efficient Image Super-Resolution (SR) aims to accelerate SR network inference by minimizing computational complexity and network parameters while preserving performance. Existing state-of-the-art Efficient Image Super-Resolution methods are based on convolutional neural networks. Few attempts have been made with Mamba to harness its long-range modeling capability and efficient computational complexity, which have shown impressive performance on high-level vision tasks. In this paper, we propose DVMSR, a novel lightweight Image SR network that incorporates vision Mamba and a distillation strategy. The network of DVMSR consists of three modules: feature extraction convolution, multiple stacked Residual State Space Blocks (RSSBs), and a reconstruction module. Specifically, the deep feature extraction module is composed of several residual state space blocks (RSSB), each of which has several vision Mamba Moudles(ViMM) together with a residual connection. To achieve efficiency improvement while maintaining comparable performance, we employ a distillation strategy to the vision Mamba network for superior performance. Specifically, we leverage the rich representation knowledge of teacher network as additional supervision for the output of lightweight student networks. Extensive experiments have demonstrated that our proposed DVMSR can outperform state-of-the-art efficient SR methods in terms of model parameters while maintaining the performance of both PSNR and SSIM. The source code is available at https://***/nathan66666/***

关键词： Efficient Image Super-Resolution vision Mamba

来源：评论

学校读者我要写书评

暂无评论

Hierarchical NeuroSymbolic Approach for Comprehensive and Explainable Action Quality Assessment

Hierarchical NeuroSymbolic Approach for Comprehensive and Ex...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Okamoto, Lauren Parmar, Paritosh Princeton Univ Princeton NJ 08544 USA ASTAR IHPC Singapore Singapore

ISBN: (纸本)9798350365474

Action quality assessment (AQA) applies computer vision to quantitatively assess the performance or execution of a human action. Current AQA approaches are end-to-end neural models, which lack transparency and tend to be biased because they are trained on subjective human judgements as ground-truth. To address these issues, we introduce a neuro-symbolic paradigm for AQA, which uses neural networks to abstract interpretable symbols from video data and makes quality assessments by applying rules to those symbols. We take diving as the case study. We found that domain experts prefer our system and find it more informative than purely neural approaches to AQA in diving. Our system also achieves state-of-the-art action recognition and temporal segmentation, and automatically generates a detailed report that breaks the dive down into its elements and provides objective scoring with visual evidence. As verified by a group of domain experts, this report may be used to assist judges in scoring, help train judges, and provide feedback to divers. Annotated training data and code: https://***/laurenok24/NSAQA.

关键词： action quality assessment action recognition AI Coach AI Diving Coach AI Diving Judge AI Olympics Judge explainable AI fairness in AI interpretable action analysis interpretable action quality assessment interpretable fine-grained action quality assessment neuro-symbolic computer vision neurosymbolic action assessment neurosymbolic action scoring neurosymbolic AI neurosymbolic fine-grained action analysis neurosymbolic fine-grained action quality assessment neurosymbolic fine-grained action recogntion neurosymbolic fine-grained action understanding neurosymbolic skills assessment neurosymbolic temporal segmentation neurosymbolic video understanding Olympics Scoring representation learning skills assessment temporal segmentation transparent AI XAI

来源：评论

学校读者我要写书评

暂无评论

Joint Multimodal Transformer for Emotion recognition in the Wild

Joint Multimodal Transformer for Emotion Recognition in the ...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Waligora, Paul Aslam, Muhammad Haseeb Zeeshan, Muhammad Osama Belharbi, Soufiane Koerich, Alessandro Lameiras Pedersoli, Marco Bacon, Simon Granger, Eric ETS Montreal LIVIA Dept Syst Engn Montreal PQ Canada Concordia Univ Dept Hlth Kinesiol & Appl Physiol Montreal PQ Canada

ISBN: (纸本)9798350365474

Multimodal emotion recognition (MMER) systems typically outperform unimodal systems by leveraging the interand intra-modal relationships between, e.g., visual, textual, physiological, and auditory modalities. This paper proposes an MMER method that relies on a joint multimodal transformer (JMT) for fusion with key-based cross-attention. This framework can exploit the complementary nature of diverse modalities to improve predictive accuracy. Separate backbones capture intra-modal spatiotemporal dependencies within each modality over video sequences. Subsequently, our JMT fusion architecture integrates the individual modality embeddings, allowing the model to effectively capture inter- and intra-modal relationships. Extensive experiments on two challenging expression recognition tasks - (1) dimensional emotion recognition on the Affwild2 dataset (with face and voice) and (2) pain estimation on the Biovid dataset (with face and biosensors) - indicate that our JMT fusion can provide a cost-effective solution for MMER. Empirical results show that MMER systems with our proposed fusion allow us to outperform relevant baseline and state-of-the-art methods. Code is available at: https://***/PoloWlg/Joint-Multimodal-Transformer-6th-ABAW

关键词： Cross Attention Joint Multimodal Transformer Multimodal Emotion recognition Pain Estimation Valence Arousal

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 7 8 9 10 11 12 13 14 15 16 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：