检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

6,421 篇 会议
25 篇 期刊文献
3 册 图书

馆藏范围

6,448 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

3,849 篇 工学
- 3,647 篇 计算机科学与技术...
- 1,431 篇 软件工程
- 790 篇 光学工程
- 302 篇 信息与通信工程
- 242 篇 控制科学与工程
- 219 篇 电气工程
- 201 篇 机械工程
- 80 篇 生物医学工程（可授...
- 68 篇 生物工程
- 67 篇 电子科学与技术（可...
- 64 篇 仪器科学与技术
- 36 篇 建筑学
- 33 篇 力学（可授工学、理...
- 33 篇 土木工程
- 33 篇 航空宇航科学与技...
- 26 篇 安全科学与工程
- 22 篇 交通运输工程
- 20 篇 材料科学与工程（可...
- 18 篇 化学工程与技术
1,453 篇 理学
- 945 篇 物理学
- 890 篇 数学
- 352 篇 统计学（可授理学、...
- 134 篇 生物学
- 38 篇 系统科学
- 23 篇 化学
160 篇 管理学
- 110 篇 图书情报与档案管...
- 52 篇 管理科学与工程(可...
- 25 篇 工商管理
112 篇 医学
- 112 篇 临床医学
17 篇 法学
- 17 篇 社会学
12 篇 农学
8 篇 教育学
7 篇 艺术学
6 篇 经济学
2 篇 军事学

主题

2,288 篇 computer vision
789 篇 pattern recognit...
637 篇 cameras
629 篇 computer science
568 篇 face recognition
555 篇 layout
510 篇 image segmentati...
509 篇 conferences
498 篇 shape
445 篇 robustness
439 篇 object recogniti...
388 篇 humans
332 篇 feature extracti...
321 篇 training
303 篇 object detection
262 篇 image recognitio...
257 篇 application soft...
246 篇 lighting
238 篇 image reconstruc...
237 篇 computational mo...

机构

41 篇 microsoft resear...
26 篇 department of co...
21 篇 swiss fed inst t...
21 篇 school of comput...
20 篇 department of co...
19 篇 swiss fed inst t...
19 篇 carnegie mellon ...
18 篇 department of co...
17 篇 department of in...
17 篇 the robotics ins...
17 篇 institute of com...
16 篇 univ sci & techn...
16 篇 robotics institu...
15 篇 tsinghua univ pe...
14 篇 department of el...
14 篇 school of comput...
14 篇 school of comput...
13 篇 univ maryland co...
13 篇 microsoft resear...
13 篇 microsoft resear...

作者

39 篇 timofte radu
28 篇 s.k. nayar
24 篇 huang thomas s.
23 篇 xiaoou tang
22 篇 t. kanade
20 篇 t.s. huang
19 篇 van gool luc
19 篇 t. darrell
19 篇 chellappa rama
18 篇 nayar shree k.
17 篇 a.k. jain
17 篇 a. zisserman
17 篇 jain anil k.
16 篇 g. healey
16 篇 torralba antonio
16 篇 heung-yeung shum
16 篇 zisserman andrew
16 篇 l. van gool
15 篇 m. shah
15 篇 ji qiang

语言

6,447 篇 英文
2 篇 其他

检索条件"任意字段=1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 1992"

共 6449 条记录，以下是61-70 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Rugby Scene Classification Enhanced by vision Language Model

Rugby Scene Classification Enhanced by Vision Language Model

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Nonaka, Naoki Fujihira, Ryo Koshiba, Toshiki Maeda, Akira Seita, Jun RIKEN Informat R&D & Strategy Headquarters Adv Data Sci Project Wako Saitama Japan Hakata Knee & Sports Clin Fukuoka Japan

ISBN: (纸本)9798350365474

This study investigates the integration of vision language models (VLM) to enhance the classification of situations within rugby match broadcasts. The importance of accurately identifying situations in sports videos is emphasized for understanding game dynamics and facilitating downstream tasks like performance evaluation and injury prevention. Utilizing a dataset comprising 18, 000 labeled images extracted at 0.2-second intervals from 100 minutes of rugby match broadcasts, scene classification tasks including contact plays (scrums, mauls, rucks, tackles, lineouts), rucks, tackles, lineouts, and multiclass classification were performed. The study aims to validate the utility of VLM outputs in improving classification performance compared to using solely image data. Experimental results demonstrate substantial performance improvements across all tasks with the incorporation of VLM outputs. Our analysis of prompts suggests that, when provided with appropriate contextual information through natural language, VLMs can effectively capture the context of a given image. The findings of our study indicate that leveraging VLMs in the domain of sports analysis holds promise for developing image processing models capable of incorpolating the tacit knowledge encoded within language models, as well as information conveyed through natural language descriptions.

关键词： Rugby Scene classification vision language model

来源：评论

学校读者我要写书评

暂无评论

Towards Efficient Audio-Visual Learners via Empowering Pre-trained vision Transformers with Cross-Modal Adaptation

Towards Efficient Audio-Visual Learners via Empowering Pre-t...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Wang, Kai Tian, Yapeng Hatzinakos, Dimitrios Univ Toronto Toronto ON Canada Univ Texas Dallas Richardson TX 75083 USA

ISBN: (纸本)9798350365474

In this paper, we explore the cross-modal adaptation of pre-trained vision Transformers (ViTs) for the audio-visual domain by incorporating a limited set of trainable parameters. To this end, we propose a Spatial-Temporal-Global Cross-Modal Adaptation (STG-CMA) to gradually equip the frozen ViTs with the capability for learning audio-visual representation, consisting of the modality-specific temporal adaptation for temporal reasoning of each modality, the cross-modal spatial adaptation for refining the spatial information with the cue from counterpart modality, and the cross-modal global adaptation for global interaction between audio and visual modalities. Our STG-CMA presents a meaningful finding that only leveraging the shared pre-trained image model with inserted lightweight adapters is enough for spatial-temporal modeling and feature interaction of audio-visual modality. Extensive experiments indicate that our STG-CMA achieves state-of-the-art performance on various audio-visual understanding tasks including AVE, AVS, and AVQA while containing significantly reduced tunable parameters. The code is available at https://***/kaiw7/STG-CMA.

关键词： Audio-visual Lenarning Cross-modal Adaptation Pre-trained vision Transformers Reduced Tunanle Parameters Spatial-temporal-global Modeling

来源：评论

学校读者我要写书评

暂无评论

Low-Rank Few-Shot Adaptation of vision-Language Models

Low-Rank Few-Shot Adaptation of Vision-Language Models

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Zanella, Maxime Ben Ayed, Ismail UCLouvain Louvain Belgium UMons Mons Belgium ETS Montreal Montreal PQ Canada

ISBN: (纸本)9798350365474

Recent progress in the few-shot adaptation of vision-Language Models (VLMs) has further pushed their generalization capabilities, at the expense of just a few labeled samples within the target downstream task. However, this promising, already quite abundant few-shot literature has focused principally on prompt learning and, to a lesser extent, on adapters, overlooking the recent advances in Parameter-Efficient Fine-Tuning (PEFT). Furthermore, existing few-shot learning methods for VLMs often rely on heavy training procedures and/or carefully chosen, taskspecific hyper-parameters, which might impede their applicability. In response, we introduce Low-Rank Adaptation (LoRA) in few-shot learning for VLMs, and show its potential on 11 datasets, in comparison to current state-ofthe-art prompt- and adapter-based approaches. Surprisingly, our simple CLIP-LoRA method exhibits substantial improvements, while reducing the training times and keeping the same hyper-parameters in all the target tasks, i.e., across all the datasets and numbers of shots. Certainly, our surprising results do not dismiss the potential of prompt-learning and adapter-based research. However, we believe that our strong baseline could be used to evaluate progress in these emergent subjects in few-shot VLMs.

关键词： adapter few-shot LoRA PEFT prompt vision-Language

来源：评论

学校读者我要写书评

暂无评论

Evaluating Confidence Calibration in Endoscopic Diagnosis Models

Evaluating Confidence Calibration in Endoscopic Diagnosis Mo...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Dehghani, Nikoo Thijssen, Ayla van der Zander, Quirine E. W. Schreuder, Ramon-Michel Schoon, Erik J. van der Sommen, Fons de With, Peter H. N. Eindhoven Univ Technol Eindhoven Netherlands Maastricht Univ Med Ctr Maastricht Netherlands GROW Res Inst Oncol & Reprod Maastricht Netherlands Catharina Hosp Eindhoven Netherlands Eindhoven Artificial Intelligence Syst Inst Eindhoven Netherlands

ISBN: (纸本)9798350365474

Colorectal polyps are prevalent precursors to colorectal cancer, making their accurate characterization essential for timely intervention and patient outcomes. Deep learning-based computer-aided diagnosis (CADx) systems have shown promising performance in the automated detection and categorization of colorectal polyps (CRP) using endoscopic images. However, alongside the advancement in diagnostic accuracy, the need for reliable and accurate quantification of uncertainty estimates within these systems has become increasingly important. The primary focus of this study is on refining the reliability of computer-aided diagnosis of CRPs within clinical practice. We perform an investigation of widely used model calibration techniques and how they translate into clinical applications, specifically for CRP categorization data. The experiments reveal that the Variational Inference method excels in intra-dataset calibration, but lacks efficiency and inter-dataset generalization. Laplace approximation and temperature scaling methods offer improved calibration across datasets.

关键词： Bayesian neural networks computer-aided diagnosis Confidence calibration Model reliability

来源：评论

学校读者我要写书评

暂无评论

Look, Listen, and Attack: Backdoor Attacks Against Video Action recognition

Look, Listen, and Attack: Backdoor Attacks Against Video Act...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Hammoud, Hasan Abed Al Kader Liu, Shuming Alkhrashi, Mohammed AlBalawi, Fahad Ghanem, Bernard KAUST Thuwal Saudi Arabia SDAIA Riyadh Saudi Arabia Taif Univ Taif Saudi Arabia

ISBN: (纸本)9798350365474

Deep neural networks (DNNs) are vulnerable to a class of attacks called "backdoor attacks", which create an association between a backdoor trigger and a target label the attacker is interested in exploiting. A backdoored DNN performs well on clean test images, yet persistently predicts an attacker-defined label for any sample in the presence of the backdoor trigger. Although backdoor attacks have been extensively studied in the image domain, there are very few works that explore such attacks in the video domain, and they tend to conclude that image backdoor attacks are less effective in the video domain. In this work, we revisit the traditional backdoor threat model and incorporate additional video-related aspects to that model. We show that poisoned-label image backdoor attacks could be extended temporally in two ways, statically and dynamically, leading to highly effective attacks in the video domain. In addition, we explore natural video backdoors to highlight the seriousness of this vulnerability in the video domain. And, for the first time, we study multi-modal (audiovisual) backdoor attacks against video action recognition models, where we show that attacking a single modality is enough for achieving a high attack success rate.

关键词： action recognition adversarial attacks backdoor backdoor attacks video understanding

来源：评论

学校读者我要写书评

暂无评论

CUE-Net: Violence Detection Video Analytics with Spatial Cropping, Enhanced UniformerV2 and Modified Efficient Additive Attention

CUE-Net: Violence Detection Video Analytics with Spatial Cro...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Senadeera, Damith Chamalke Yang, Xiaoyun Kollias, Dimitrios Slabaugh, Gregory Queen Mary Univ London Sch Elect Engn & Comp Sci London England Queen Marys Digital Environm Res Inst DERI London England Remark AI UK Ltd London England

ISBN: (纸本)9798350365474

In this paper we introduce CUE-Net, a novel architecture designed for automated violence detection in video surveillance. As surveillance systems become more prevalent due to technological advances and decreasing costs, the challenge of efficiently monitoring vast amounts of video data has intensified. CUE-Net addresses this challenge by combining spatial Cropping with an enhanced version of the UniformerV2 architecture, integrating convolutional and self-attention mechanisms alongside a novel Modified Efficient Additive Attention mechanism (which reduces the quadratic time complexity of self-attention) to effectively and efficiently identify violent activities. This approach aims to overcome traditional challenges such as capturing distant or partially obscured subjects within video frames. By focusing on both local and global spatio-temporal features, CUE-Net achieves state-of-the-art performance on the RWF-2000 and RLVS datasets, surpassing existing methods. The source code is available at (1).

关键词： computer vision Cropping Deep Learning Efficient Additive Attention UniFormerV2 Video Analytics Violence Detection

来源：评论

学校读者我要写书评

暂无评论

Multi Model Ensemble for Compound Expression recognition

Multi Model Ensemble for Compound Expression Recognition

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Yu, Jun Zhu, Jichao Zhu, Wangyuan Cai, Zhongpeng Zhao, Gongpeng Wei, Zhihong Xie, Guochen Zhang, Zerui Liu, Qingsong Liang, Jiaen Univ Sci & Technol China Hefei Anhui Peoples R China Unisound AI Technol Co Ltd Beijing Peoples R China

ISBN: (纸本)9798350365474

Compound Expression recognition (CER) plays a crucial role in interpersonal interactions. Due to the complexity of human emotional expressions, which leads to the existence of compound expressions, it is necessary to consider both local and global facial expressions comprehensively for recognition. In this paper, to address this issue, we propose a solution for compound expression recognition based on ensemble learning methods. Specifically, our task is classification. We trained three expression classification models based on convolutional networks (ResNet50), vision Transformers, and multi-scale local attention networks, respectively. Then, by using late fusion, integrated the outputs of three models to predict the final result, leveraging the strengths of different models. Our method achieves high accuracy on RAF-DB and in sixth Affective Behavior Analysis in-the-wild (ABAW) Challenge, achieves an F1 score of 0.224 on the test set of C-EXPR-DB.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Can the accuracy bias by facial hairstyle be reduced through balancing the training data?

Can the accuracy bias by facial hairstyle be reduced through...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Ozturk, Kagan Wu, Haiyu Bowyer, Kevin W. Univ Notre Dame Notre Dame IN 46556 USA

ISBN: (纸本)9798350365474

Appearance of a face can be greatly altered by growing a beard and mustache. The facial hairstyles in a pair of images can cause marked changes to the impostor distribution and the genuine distribution. Also, different distributions of facial hairstyle across demographics could cause a false impression of relative accuracy across demographics. We first show that, even though larger training sets boost the recognition accuracy on all facial hairstyles, accuracy variations caused by facial hairstyles persist regardless of the size of the training set. Then, we analyze the impact of having different fractions of the training data represent facial hairstyles. We created balanced training sets using a set of identities available in Webface42M that both have clean-shaven and facial hair images. We find that, even when a face recognition model is trained with a balanced clean-shaven / facial hair training set, accuracy variation on the test data does not diminish. Next, data augmentation is employed to further investigate the effect of facial hair distribution in training data by manipulating facial hair pixels with the help of facial landmark points and a facial hair segmentation model. Our results show facial hair causes an accuracy gap between clean-shaven and facial hair images, and this impact can be significantly different between African-Americans and Caucasians.

关键词： Face recognition

来源：评论

学校读者我要写书评

暂无评论

Video Interaction recognition using an Attention Augmented Relational Network and Skeleton Data

Video Interaction Recognition using an Attention Augmented R...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Askari, Farzaneh Yared, Cyril Ramaprasad, Rohit Garg, Devin Hu, Anjun Clark, James J. McGill Univ Montreal PQ Canada Univ Calif San Diego San Diego CA USA Univ Oxford Oxford England

ISBN: (纸本)9798350365474

Recognizing interactions in multi-person videos, known as Video Interaction recognition (VIR), is crucial for understanding video content. Often the human skeleton pose (skeleton, for short) is a popular feature for VIR as the main feature, given its success for the task in hand. While many studies have made progress using complex architectures like Graph Neural Networks (GNN) and Transformers to capture interactions in videos, studies such as [33] that apply simple, easy to train, and adaptive architectures such as Relation reasoning Network (RN) [37], yield competitive results. Inspired by this trend, we propose the Attention Augmented Relational Network (AARN), a straightforward yet effective model that uses skeleton data to recognize interactions in videos. AARN outperforms other RN-based models and remains competitive against larger, more intricate models. We evaluate our approach on a challenging real-world Hockey Penalty Dataset (HPD), where the videos depict complex interactions between players in a non-laboratory recording setup, in addition to popular benchmark datasets demonstrating strong performance. Lastly, we show the impact of skeleton quality on the classification accuracy and the struggle of off-the-shelf pose estimators to extract precise skeleton from the challenging HPD dataset.

关键词： Musculoskeletal system

来源：评论

学校读者我要写书评

暂无评论

Hairy Ground Truth Enhancement for Semantic Segmentation

Hairy Ground Truth Enhancement for Semantic Segmentation

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Fischer, Sophie Voiculescu, Irina Univ Oxford Dept Comp Sci Oxford England

ISBN: (纸本)9798350365474

Semantic segmentation is a key task within applications of machine learning for medical imaging, requiring large amounts of medical scans annotated by clinicians. The high cost of data annotation means that models need to make the most of all available ground truth masks;yet many models consider two false positive (or false negative) pixel predictions as 'equally wrong' regardless of the individual pixels' relative position to the ground truth mask. These methods also have no sense of whether a pixel is solitary or belongs to a contiguous group. We propose the Hairy transform, a novel method to enhance ground truths using 3D 'hairs' to represent each pixel's position relative to objects in the ground truth. We illustrate its effectiveness using a mainstream model and loss function on a commonly used cardiac MRI dataset, as well as a set of synthetic data constructed to highlight the effect of the method during training. The overall improvement in segmentation results comes at the small cost of a one-off pre-processing step, and can easily be integrated into any standard machine learning model. Rather than looking to make minute improvements for mostly correct 'standard' masks we instead show how this method helps improve robustness against catastrophic failures for edge cases.

关键词： computer vision Ground Truth Enhancement Machine Learning Medical Imaging

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 3 4 5 6 7 8 9 10 11 12 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：