检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

7,181 篇 会议
27 篇 期刊文献
11 册 图书

馆藏范围

7,218 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

4,396 篇 工学
- 4,002 篇 计算机科学与技术...
- 1,810 篇 软件工程
- 869 篇 光学工程
- 401 篇 控制科学与工程
- 388 篇 机械工程
- 375 篇 信息与通信工程
- 222 篇 仪器科学与技术
- 203 篇 电气工程
- 125 篇 生物医学工程（可授...
- 111 篇 生物工程
- 100 篇 电子科学与技术（可...
- 45 篇 化学工程与技术
- 42 篇 建筑学
- 42 篇 安全科学与工程
- 38 篇 土木工程
- 35 篇 力学（可授工学、理...
- 35 篇 航空宇航科学与技...
- 30 篇 交通运输工程
1,816 篇 理学
- 1,159 篇 数学
- 1,046 篇 物理学
- 406 篇 统计学（可授理学、...
- 178 篇 生物学
- 48 篇 系统科学
- 45 篇 化学
225 篇 医学
- 224 篇 临床医学
220 篇 管理学
- 166 篇 图书情报与档案管...
- 58 篇 管理科学与工程(可...
- 32 篇 工商管理
151 篇 艺术学
- 151 篇 设计学（可授艺术学...
30 篇 法学
- 29 篇 社会学
24 篇 农学
10 篇 教育学
8 篇 经济学
2 篇 文学
2 篇 军事学

主题

2,406 篇 computer vision
850 篇 pattern recognit...
694 篇 cameras
658 篇 computer science
653 篇 face recognition
594 篇 layout
543 篇 image segmentati...
516 篇 conferences
514 篇 shape
473 篇 object recogniti...
465 篇 robustness
424 篇 humans
371 篇 feature extracti...
340 篇 object detection
317 篇 training
282 篇 application soft...
280 篇 image recognitio...
265 篇 lighting
245 篇 computational mo...
239 篇 image reconstruc...

机构

41 篇 microsoft resear...
26 篇 department of co...
24 篇 school of comput...
24 篇 institute for co...
21 篇 swiss fed inst t...
20 篇 swiss fed inst t...
20 篇 carnegie mellon ...
20 篇 department of co...
18 篇 department of co...
18 篇 school of comput...
17 篇 department of in...
17 篇 the robotics ins...
17 篇 institute of com...
16 篇 univ sci & techn...
16 篇 department of el...
16 篇 robotics institu...
15 篇 national laborat...
15 篇 computer vision ...
15 篇 tsinghua univ pe...
15 篇 school of comput...

作者

39 篇 timofte radu
28 篇 s.k. nayar
27 篇 huang thomas s.
23 篇 xiaoou tang
23 篇 bischof horst
22 篇 van gool luc
22 篇 t. kanade
20 篇 t.s. huang
19 篇 t. darrell
19 篇 jain anil k.
18 篇 nayar shree k.
18 篇 torralba antonio
18 篇 chellappa rama
17 篇 a.k. jain
17 篇 a. zisserman
17 篇 zisserman andrew
16 篇 zhang lei
16 篇 g. healey
16 篇 heung-yeung shum
16 篇 yan shuicheng

语言

7,162 篇 英文
56 篇 中文
1 篇 其他

检索条件"任意字段=2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2010"

共 7219 条记录，以下是101-110 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

AIGeN: An Adversarial Approach for Instruction Generation in VLN

AIGeN: An Adversarial Approach for Instruction Generation in...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Rawal, Niyati Bigazzi, Roberto Baraldi, Lorenzo Cucchiara, Rita Univ Modena & Reggio Emilia Modena Italy

ISBN: (纸本)9798350365474

In the last few years, the research interest in vision-and-Language Navigation (VLN) has grown significantly. VLN is a challenging task that involves an agent following human instructions and navigating in a previously unknown environment to reach a specified goal. Recent work in literature focuses on different ways to augment the available datasets of instructions for improving navigation performance by exploiting synthetic training data. In this work, we propose AIGeN, a novel architecture inspired by Generative Adversarial Networks (GANs) that produces meaningful and well-formed synthetic instructions to improve navigation agents' performance. The model is composed of a Transformer decoder (GPT-2) and a Transformer encoder (BERT). During the training phase, the decoder generates sentences for a sequence of images describing the agent's path to a particular point while the encoder discriminates between real and fake instructions. Experimentally, we evaluate the quality of the generated instructions and perform extensive ablation studies. Additionally, we generate synthetic instructions for 217K trajectories using AIGeN on Habitat-Matterport 3D Dataset (HM3D) and show an improvement in the performance of an off-the-shelf VLN method. The validation analysis of our proposal is conducted on REVERIE and R2R and highlights the promising aspects of our proposal, achieving state-of-the-art performance.

关键词： Generative Adversarial Networks Text Generation vision-and-Language Navigation

来源：评论

学校读者我要写书评

暂无评论

Exploring the Benefits of vision Foundation Models for Unsupervised Domain Adaptation

Exploring the Benefits of Vision Foundation Models for Unsup...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Englert, Bruno B. Piva, Fabrizio J. Kerssies, Tommie de Geus, Daan Dubbelman, Gijs Eindhoven Univ Technol Eindhoven Netherlands

ISBN: (纸本)9798350365474

Achieving robust generalization across diverse data domains remains a significant challenge in computer vision. This challenge is important in safety-critical applications, where deep-neural-network-based systems must perform reliably under various environmental conditions not seen during training. Our study investigates whether the generalization capabilities of vision Foundation Models (VFMs) and Unsupervised Domain Adaptation (UDA) methods for the semantic segmentation task are complementary. Results show that combining VFMs with UDA has two main benefits: (a) it allows for better UDA performance while maintaining the out-of-distribution performance of VFMs, and (b) it makes certain time-consuming UDA components redundant, thus enabling significant inference speedups. Specifically, with equivalent model sizes, the resulting VFM-UDA method achieves an 8.4x speed increase over the prior non-VFM state of the art, while also improving performance by +1.2 mIoU in the UDA setting and by +6.1 mIoU in terms of out-of-distribution generalization. Moreover, when we use a VFM with 3.6x more parameters, the VFM-UDA approach maintains a 3.3x speed up, while improving the UDA performance by +3.1 mIoU and the out-of-distribution performance by +10.3 mIoU. These results underscore the significant benefits of combining VFMs with UDA, setting new standards and baselines for Unsupervised Domain Adaptation in semantic segmentation. The implementation is available at https://***/tue-mps/vfmuda.

关键词： foundation model generalization semantic segmentation unsupervised domain adaptation vision foundation model

来源：评论

学校读者我要写书评

暂无评论

CAGE: Circumplex Affect Guided Expression Inference

CAGE: Circumplex Affect Guided Expression Inference

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Wagner, Niklas Maetzler, Felix Vossberg, Samed R. Schneider, Helen Pavlitska, Svetlana Zoellner, J. Marius Karlsruhe Inst Technol KIT Karlsruhe Germany FZI Res Ctr Informat Technol Karlsruhe Germany

ISBN: (纸本)9798350365474

Understanding emotions and expressions is a task of interest across multiple disciplines, especially for improving user experiences. Contrary to the common perception, it has been shown that emotions are not discrete entities but instead exist along a continuum. People understand discrete emotions differently due to a variety of factors, including cultural background, individual experiences, and cognitive biases. Therefore, most approaches to expression understanding, particularly those relying on discrete categories, are inherently biased. In this paper, we present a comparative in-depth analysis of two common datasets (AffectNet and EMOTIC) equipped with the components of the circumplex model of affect. Further, we propose a model for the prediction of facial expressions tailored for lightweight applications. Using a small-scaled MaxViT-based model architecture, we evaluate the impact of discrete expression category labels in training with the continuous valence and arousal labels. We show that considering valence and arousal in addition to discrete category labels helps to significantly improve expression inference. The proposed model outperforms the current state-of-the-art models on AffectNet, establishing it as the best-performing model for inferring valence and arousal achieving a 7% lower RMSE. Training scripts and trained weights to reproduce our results can be found here: https:// ***/wagner-niklas/CAGE_expression_inference.

关键词： computer vision Expression Inference Transformer

来源：评论

学校读者我要写书评

暂无评论

NTIRE 2024 Challenge on Stereo Image Super-Resolution: Methods and Results

NTIRE 2024 Challenge on Stereo Image Super-Resolution: Metho...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Wang, Longguang Guo, Yulan Li, Juncheng Liu, Hongda Zhao, Yang Wang, Yingqian Jin, Zhi Gu, Shuhang Timofte, Radu Aviation University of Air Force Sun Yat-sen University The Shenzhen Campus of Sun Yat-sen University China National University of Defense Technology China Shanghai University China University of Electronic Science and Technology of China China Computer Vision Lab University of Würzburg Germany

ISBN: (纸本)9798350365474

This paper summarizes the 3rd NTIRE challenge on stereo image super-resolution (SR) with a focus on new solutions and results. The task of this challenge is to super-resolve a low-resolution stereo image pair to a high-resolution one with a magnification factor of x4 under a limited computational budget. Compared with single image SR, the major challenge of this challenge lies in how to exploit additional information in another viewpoint and how to maintain stereo consistency in the results. This challenge has 2 tracks, including one track on bicubic degradation and one track on real degradations. In total, 108 and 70 participants were successfully registered for each track, respectively. In the test phase, 14 and 13 teams successfully submitted valid results with PSNR (RGB) scores better than the baseline. This challenge establishes a new benchmark for stereo image SR.

关键词： Stereocenters

来源：评论

学校读者我要写书评

暂无评论

ELSA: Exploiting Layer-wise N:M Sparsity for vision Transformer Acceleration

ELSA: Exploiting Layer-wise N:M Sparsity for Vision Transfor...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Huang, Ning-Chi Chang, Chi-Chih Lin, Wei-Cheng Taka, Endri Marculescu, Diana Wu, Kai-Chiang Natl Yang Ming Chiao Tung Univ Hsinchu Taiwan Univ Texas Austin Austin TX USA

ISBN: (纸本)9798350365474

N:M sparsity is an emerging model compression method supported by more and more accelerators to speed up sparse matrix multiplication in deep neural networks. Most existing N:M sparsity methods compress neural networks with a uniform setting for all layers in a network or heuristically determine the layer-wise configuration by considering the number of parameters in each layer. However, very few methods have been designed for obtaining a layer-wise customized N:M sparse configuration for vision transformers (ViTs), which usually consist of transformer blocks involving the same number of parameters. In this work, to address the challenge of selecting suitable sparse configuration for ViTs on N:M sparsity-supporting accelerators, we propose ELSA, Exploiting Layer-wise N:M Sparsity for ViTs. Considering not only all N:M sparsity levels supported by a given accelerator but also the expected throughput improvement, our methodology can reap the benefits of accelerators supporting mixed sparsity by trading off negligible accuracy loss with both memory usage and inference time reduction for ViT models. For instance, our approach achieves a noteworthy 2.9x reduction in FLOPs to both Swin-B and DeiT-B with only a marginal degradation of accuracy on ImageNet. Our code is publicly available at https://***/ningchihuang/ ELSA.

关键词： Deep neural networks

来源：评论

学校读者我要写书评

暂无评论

Exploring the Zero-Shot Capabilities of vision-Language Models for Improving Gaze Following

Exploring the Zero-Shot Capabilities of Vision-Language Mode...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Gupta, Anshul Vuillecard, Pierre Farkhondeh, Arya Odobez, Jean-Marc Idiap Res Inst Martigny Switzerland Ecole Polytech Fed Lausanne Lausanne Switzerland

ISBN: (纸本)9798350365474

Contextual cues related to a person's pose and interactions with objects and other people in the scene can provide valuable information for gaze following. While existing methods have focused on dedicated cue extraction methods, in this work we investigate the zero-shot capabilities of vision-Language Models (VLMs) for extracting a wide array of contextual cues to improve gaze following performance. We first evaluate various VLMs, prompting strategies, and in-context learning (ICL) techniques for zero-shot cue recognition performance. We then use these insights to extract contextual cues for gaze following, and investigate their impact when incorporated into a state of the art model for the task. Our analysis indicates that BLIP-2 is the overall top performing VLM and that ICL can improve performance. We also observe that VLMs are sensitive to the choice of the text prompt although ensembling over multiple text prompts can provide more robust performance. Additionally, we discover that using the entire image along with an ellipse drawn around the target person is the most effective strategy for visual prompting. For gaze following, incorporating the extracted cues results in better generalization performance, especially when considering a larger set of cues, highlighting the potential of this approach.

关键词： Gaze Following vision-Language Zero-Shot Evaluation

来源：评论

学校读者我要写书评

暂无评论

Multimodal Attack Detection for Action recognition Models

Multimodal Attack Detection for Action Recognition Models

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Mumcu, Furkan Yilmaz, Yasin Univ S Florida 4202 E Fowler Ave Tampa FL 33620 USA

ISBN: (纸本)9798350365474

Adversarial machine learning attacks on video action recognition models is a growing research area and many effective attacks were introduced in recent years. These attacks show that action recognition models can be breached in many ways. Hence using these models in practice raises significant security concerns. However, there are very few works which focus on defending against or detecting attacks. In this work, we propose a novel universal detection method which is compatible with any action recognition model. In our extensive experiments, we show that our method consistently detects various attacks against different target models with high true positive rates while satisfying very low false positive rates. Tested against four state-of-the-art attacks targeting four action recognition models, the proposed detector achieves an average AUC of 0.911 over 16 test cases while the best performance achieved by the existing detectors is 0.645 average AUC. This 41.2% improvement is enabled by the robustness of the proposed detector to varying attack methods and target models. The lowest AUC achieved by our detector across the 16 test cases is 0.837 while the competing detector's performance drops as low as 0.211. We also show that the proposed detector is robust to varying attack strengths. In addition, we analyze our method's real-time performance with different hardware setups to demonstrate its potential as a practical defense mechanism.

关键词： Action recognition Models Adversarial machine learning attacks Attack detection

来源：评论

学校读者我要写书评

暂无评论

DVMSR: Distillated vision Mamba for Efficient Super-Resolution

DVMSR: Distillated Vision Mamba for Efficient Super-Resoluti...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Lei, Xiaoyan Zhang, Wenlong Cao, Weifeng Zhengzhou Univ Light Ind Zhengzhou Peoples R China HongKong Polytech Univ Hong Kong Peoples R China

ISBN: (纸本)9798350365474

Efficient Image Super-Resolution (SR) aims to accelerate SR network inference by minimizing computational complexity and network parameters while preserving performance. Existing state-of-the-art Efficient Image Super-Resolution methods are based on convolutional neural networks. Few attempts have been made with Mamba to harness its long-range modeling capability and efficient computational complexity, which have shown impressive performance on high-level vision tasks. In this paper, we propose DVMSR, a novel lightweight Image SR network that incorporates vision Mamba and a distillation strategy. The network of DVMSR consists of three modules: feature extraction convolution, multiple stacked Residual State Space Blocks (RSSBs), and a reconstruction module. Specifically, the deep feature extraction module is composed of several residual state space blocks (RSSB), each of which has several vision Mamba Moudles(ViMM) together with a residual connection. To achieve efficiency improvement while maintaining comparable performance, we employ a distillation strategy to the vision Mamba network for superior performance. Specifically, we leverage the rich representation knowledge of teacher network as additional supervision for the output of lightweight student networks. Extensive experiments have demonstrated that our proposed DVMSR can outperform state-of-the-art efficient SR methods in terms of model parameters while maintaining the performance of both PSNR and SSIM. The source code is available at https://***/nathan66666/***

关键词： Efficient Image Super-Resolution vision Mamba

来源：评论

学校读者我要写书评

暂无评论

Hierarchical NeuroSymbolic Approach for Comprehensive and Explainable Action Quality Assessment

Hierarchical NeuroSymbolic Approach for Comprehensive and Ex...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Okamoto, Lauren Parmar, Paritosh Princeton Univ Princeton NJ 08544 USA ASTAR IHPC Singapore Singapore

ISBN: (纸本)9798350365474

Action quality assessment (AQA) applies computer vision to quantitatively assess the performance or execution of a human action. Current AQA approaches are end-to-end neural models, which lack transparency and tend to be biased because they are trained on subjective human judgements as ground-truth. To address these issues, we introduce a neuro-symbolic paradigm for AQA, which uses neural networks to abstract interpretable symbols from video data and makes quality assessments by applying rules to those symbols. We take diving as the case study. We found that domain experts prefer our system and find it more informative than purely neural approaches to AQA in diving. Our system also achieves state-of-the-art action recognition and temporal segmentation, and automatically generates a detailed report that breaks the dive down into its elements and provides objective scoring with visual evidence. As verified by a group of domain experts, this report may be used to assist judges in scoring, help train judges, and provide feedback to divers. Annotated training data and code: https://***/laurenok24/NSAQA.

关键词： action quality assessment action recognition AI Coach AI Diving Coach AI Diving Judge AI Olympics Judge explainable AI fairness in AI interpretable action analysis interpretable action quality assessment interpretable fine-grained action quality assessment neuro-symbolic computer vision neurosymbolic action assessment neurosymbolic action scoring neurosymbolic AI neurosymbolic fine-grained action analysis neurosymbolic fine-grained action quality assessment neurosymbolic fine-grained action recogntion neurosymbolic fine-grained action understanding neurosymbolic skills assessment neurosymbolic temporal segmentation neurosymbolic video understanding Olympics Scoring representation learning skills assessment temporal segmentation transparent AI XAI

来源：评论

学校读者我要写书评

暂无评论

Joint Multimodal Transformer for Emotion recognition in the Wild

Joint Multimodal Transformer for Emotion Recognition in the ...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Waligora, Paul Aslam, Muhammad Haseeb Zeeshan, Muhammad Osama Belharbi, Soufiane Koerich, Alessandro Lameiras Pedersoli, Marco Bacon, Simon Granger, Eric ETS Montreal LIVIA Dept Syst Engn Montreal PQ Canada Concordia Univ Dept Hlth Kinesiol & Appl Physiol Montreal PQ Canada

ISBN: (纸本)9798350365474

Multimodal emotion recognition (MMER) systems typically outperform unimodal systems by leveraging the interand intra-modal relationships between, e.g., visual, textual, physiological, and auditory modalities. This paper proposes an MMER method that relies on a joint multimodal transformer (JMT) for fusion with key-based cross-attention. This framework can exploit the complementary nature of diverse modalities to improve predictive accuracy. Separate backbones capture intra-modal spatiotemporal dependencies within each modality over video sequences. Subsequently, our JMT fusion architecture integrates the individual modality embeddings, allowing the model to effectively capture inter- and intra-modal relationships. Extensive experiments on two challenging expression recognition tasks - (1) dimensional emotion recognition on the Affwild2 dataset (with face and voice) and (2) pain estimation on the Biovid dataset (with face and biosensors) - indicate that our JMT fusion can provide a cost-effective solution for MMER. Empirical results show that MMER systems with our proposed fusion allow us to outperform relevant baseline and state-of-the-art methods. Code is available at: https://***/PoloWlg/Joint-Multimodal-Transformer-6th-ABAW

关键词： Cross Attention Joint Multimodal Transformer Multimodal Emotion recognition Pain Estimation Valence Arousal

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 7 8 9 10 11 12 13 14 15 16 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：