检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

12,844 篇 会议
13 篇 期刊文献
2 册 图书

馆藏范围

12,859 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

7,573 篇 工学
- 6,863 篇 计算机科学与技术...
- 880 篇 机械工程
- 814 篇 软件工程
- 435 篇 控制科学与工程
- 360 篇 光学工程
- 306 篇 电气工程
- 209 篇 仪器科学与技术
- 124 篇 信息与通信工程
- 91 篇 生物工程
- 62 篇 生物医学工程（可授...
- 39 篇 电子科学与技术（可...
- 34 篇 安全科学与工程
- 26 篇 化学工程与技术
- 21 篇 交通运输工程
- 20 篇 建筑学
- 18 篇 土木工程
2,957 篇 医学
- 2,956 篇 临床医学
- 15 篇 基础医学(可授医学...
- 12 篇 药学(可授医学、理...
700 篇 理学
- 359 篇 物理学
- 225 篇 数学
- 175 篇 系统科学
- 95 篇 统计学（可授理学、...
- 93 篇 生物学
- 22 篇 化学
201 篇 艺术学
- 201 篇 设计学（可授艺术学...
84 篇 管理学
- 59 篇 图书情报与档案管...
- 25 篇 管理科学与工程(可...
- 14 篇 工商管理
23 篇 法学
- 21 篇 社会学
5 篇 农学
4 篇 教育学
2 篇 经济学
1 篇 军事学

主题

6,464 篇 computer vision
2,688 篇 training
2,437 篇 pattern recognit...
1,780 篇 computational mo...
1,522 篇 visualization
1,348 篇 three-dimensiona...
1,091 篇 computer archite...
1,063 篇 semantics
997 篇 benchmark testin...
976 篇 codes
970 篇 conferences
854 篇 feature extracti...
830 篇 cameras
771 篇 task analysis
707 篇 deep learning
646 篇 image segmentati...
611 篇 object detection
595 篇 shape
554 篇 transformers
538 篇 neural networks

机构

132 篇 univ sci & techn...
122 篇 carnegie mellon ...
120 篇 tsinghua univ pe...
114 篇 univ chinese aca...
113 篇 chinese univ hon...
94 篇 tsinghua univers...
91 篇 zhejiang univ pe...
91 篇 swiss fed inst t...
85 篇 peng cheng lab p...
81 篇 university of ch...
80 篇 zhejiang univers...
77 篇 shanghai ai lab ...
77 篇 peng cheng labor...
75 篇 university of sc...
69 篇 shanghai jiao to...
68 篇 shanghai jiao to...
67 篇 alibaba grp peop...
67 篇 stanford univ st...
66 篇 univ hong kong p...
64 篇 sensetime res pe...

作者

77 篇 timofte radu
63 篇 van gool luc
45 篇 zhang lei
36 篇 yang yi
36 篇 luc van gool
34 篇 tao dacheng
31 篇 loy chen change
29 篇 chen chen
28 篇 sun jian
28 篇 qi tian
25 篇 li xin
24 篇 liu yang
24 篇 tian qi
24 篇 ying shan
23 篇 wang xinchao
23 篇 zha zheng-jun
23 篇 boxin shi
21 篇 zhou jie
21 篇 vasconcelos nuno
20 篇 luo ping

语言

12,851 篇 英文
7 篇 其他
1 篇 中文

检索条件"任意字段=IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops"

共 12859 条记录，以下是4671-4680 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

Video Analytics for Detecting Motorcyclist Helmet Rule Violations

Video Analytics for Detecting Motorcyclist Helmet Rule Viola...

引用

ieee computer Society conference on computer vision and pattern recognition workshops (CVPRW)

作者： Chun-Ming Tsai Jun-Wei Hsieh Ming-Ching Chang Guan-Lin He Ping-Yang Chen Wei-Tsung Chang Yi-Kuan Hsieh Department of Computer Science University of Taipei Taipei Taiwan College of AI and Green Energy National Yang Ming Chiao Tung University Tainan Taiwan Department of Computer Science University at Albany State University of New York NY USA

The use of helmets is essential for motorcyclists' safety, but non-compliance with helmet rules remains a common issue. In this study, we extend the frontier of AI video analytic technologies for detecting violations of helmet rules among motorcyclists. Our method can handle highly challenging conditions for traditional methods, including occlusions, fast vehicle movement, shadows, large viewing angles, poor illumination and weather conditions. We adopt the widely used YOLOv7 object detector and develop a first baseline using YOLOv7-E6E. We further develop two improved versions, namely YOLOv7-CBAM and YOLOv7-SimAM that better address the challenges. Experiments are performed on the 2023 AI City Challenge Track 5 contest benchmark. Evaluation on the 100 test videos of the contest demonstrates the effectiveness of our approach. The baseline YOLOv7-E6E model trained with image size 1920 achieves 0.6112 mAP. The YOLOv7-CBAM achieves 0.6389 mAP, and YOLOv7-SimAM achieves 0.6422 mAP, where both are trained with image size 1280. These models rank sixth, fifth, and fourth on the public leaderboard, respectively, which outperforms over 36 global participating teams. The code for our models is available at: https://***/cmtsai2023/AICITY2023_Track5_DVHRM.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Cyclic Co-Learning of Sounding Object Visual Grounding and Sound Separation

Cyclic Co-Learning of Sounding Object Visual Grounding and S...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Tian, Yapeng Hu, Di Xu, Chenliang Univ Rochester Rochester NY 14627 USA Renmin Univ China Gaoling Sch Artificial Intelligence Beijing Peoples R China Beijing Key Lab Big Data Management & Anal Method Beijing Peoples R China

ISBN: (纸本)9781665445092

There are rich synchronized audio and visual events in our daily life. Inside the events, audio scenes are associated with the corresponding visual objects;meanwhile, sounding objects can indicate and help to separate their individual sounds in the audio track. Based on this observation, in this paper, we propose a cyclic co-learning (CCoL) paradigm that can jointly learn sounding object visual grounding and audio-visual sound separation in a unified framework. Concretely, we can leverage grounded object-sound relations to improve the results of sound separation. Meanwhile, benefiting from discriminative information from separated sounds, we improve training example sampling for sounding object grounding, which builds a co-learning cycle for the two tasks and makes them mutually beneficial. Extensive experiments show that the proposed framework outperforms the compared recent approaches on both tasks, and they can benefit from each other with our cyclic co-learning.

关键词： Training Visualization computer vision Codes Grounding Computational modeling pattern recognition

来源：评论

学校读者我要写书评

暂无评论

Separating Skills and Concepts for Novel Visual Question Answering

Separating Skills and Concepts for Novel Visual Question Ans...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Whitehead, Spencer Wu, Hui Ji, Heng Feris, Rogerio Saenko, Kate UIUC Urbana IL 61801 USA IBM Res MIT IBM Watson AI Lab Yorktown Hts NY USA Boston Univ Boston MA 02215 USA MIT IBM Watson AI Lab Cambridge MA USA

ISBN: (纸本)9781665445092

Generalization to out-of-distribution data has been a problem for Visual Question Answering (VQA) models. To measure generalization to novel questions, we propose to separate them into "skills" and "concepts". "Skills" are visual tasks, such as counting or attribute recognition, and are applied to "concepts" mentioned in the question, such as objects and people. VQA methods should be able to compose skills and concepts in novel ways, regardless of whether the specific composition has been seen in training, yet we demonstrate that existing models have much to improve upon towards handling new compositions. We present a novel method for learning to compose skills and concepts that separates these two factors implicitly within a model by learning grounded concept representations and disentangling the encoding of skills from that of concepts. We enforce these properties with a novel contrastive learning procedure that does not rely on external annotations and can be learned from unlabeled image-question pairs. Experiments demonstrate the effectiveness of our approach for improving compositional and grounding performance.(1)

关键词： Training Visualization computer vision Grounding Annotations Knowledge discovery Encoding

来源：评论

学校读者我要写书评

暂无评论

Masked Autoencoders are Secretly Efficient Learners

Masked Autoencoders are Secretly Efficient Learners

引用

ieee computer Society conference on computer vision and pattern recognition workshops (CVPRW)

作者： Zihao Wei Chen Wei Jieru Mei Yutong Bai Zeyu Wang Xianhang Li Hongru Zhu Huiyu Wang Alan Yuille Yuyin Zhou Cihang Xie University of Michigan Ann Arbor Johns Hopkins University UC Santa Cruz Meta

ISBN: (数字)9798350365474

ISBN: (纸本)9798350365481

This paper provides an efficiency study of training Masked Autoencoders (MAE), a framework introduced by He et al. [13] for pre-training vision Transformers (ViTs). Our results surprisingly reveal that MAE can learn at a faster speed and with fewer training samples while maintaining high performance. To accelerate its training, our changes are simple and straightforward: in the pre-training stage, we aggressively increase the masking ratio, decrease the number of training epochs, and reduce the decoder depth to lower the pre-training cost; in the fine-tuning stage, we demonstrate that layer-wise learning rate decay plays a vital role in unlocking the full potential of pre-trained models. Under this setup, we further verify the sample efficiency of MAE: training MAE is hardly affected even when using only 20% of the original training *** combining these strategies, we are able to accelerate MAE pre-training by a factor of 82 or more, with little performance drop. For example, we are able to pre-train a ViT-B in ~9 hours using a single NVIDIA A100 GPU and achieve 82.9% top-1 accuracy on the downstream ImageNet classification task. Additionally, we also verify the speed acceleration on another MAE extension, SupMAE.

关键词： Training computer vision Costs conferences Computational modeling Graphics processing units Transformers

来源：评论

学校读者我要写书评

暂无评论

OpenStory: A Large-Scale Open-Domain Dataset for Subject-Driven Visual Storytelling

OpenStory: A Large-Scale Open-Domain Dataset for Subject-Dri...

引用

ieee computer Society conference on computer vision and pattern recognition workshops (CVPRW)

作者： Zilyu Ye Jinxiu Liu JinJin Cao Zhiyang Chen Ziwei Xuan Mingyuan Zhou Qi Liu Guo-Jun Qi School of Future Technology South China University of Technology Westlake University Foundation Model Research Center CASIA OPPO US Research Center

ISBN: (数字)9798350365474

ISBN: (纸本)9798350365481

Recently, the advancement and evolution of generative AI have been highly compelling. In this paper, we present OpenStory, a large-scale dataset tailored for training subject-focused story visualization models to generate coherent and contextually relevant visual narratives. Addressing the challenges of maintaining subject continuity across frames and capturing compelling narratives, We propose an innovative pipeline that automates the extraction of keyframes from open-domain videos. It ingeniously employs vision-language models to generate descriptive captions, which are then refined by a large language model to ensure narrative flow and coherence. Furthermore, advanced subject masking techniques are applied to isolate and segment the primary subjects. Derived from diverse video sources, including YouTube and existing datasets, OpenStory offers a comprehensive open-domain resource, surpassing prior datasets confined to specific scenarios. With automated captioning instead of manual annotation, high-resolution imagery optimized for subject count per frame, and extensive frame sequences ensuring consistent subjects for temporal modeling, OpenStory establishes itself as an invaluable benchmark. It facilitates advancements in subject-focused story visualization, enabling the training of models capable of comprehending and generating intricate multi-modal narratives from extensive visual and textual inputs.

关键词： Training Visualization computer vision Annotations Large language models Pipelines Manuals

来源：评论

学校读者我要写书评

暂无评论

PANDA: Adapting Pretrained Features for Anomaly Detection and Segmentation

PANDA: Adapting Pretrained Features for Anomaly Detection an...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Reiss, Tal Cohen, Niv Bergman, Liron Hoshen, Yedid Hebrew Univ Jerusalem Sch Comp Sci & Engn Jerusalem Israel

ISBN: (纸本)9781665445092

Anomaly detection methods require high-quality features. In recent years, the anomaly detection community has attempted to obtain better features using advances in deep self-supervised feature learning. Surprisingly, a very promising direction, using pre-trained deep features, has been mostly overlooked. In this paper, we first empirically establish the perhaps expected, but unreported result, that combining pre-trained features with simple anomaly detection and segmentation methods convincingly outperforms, much more complex, state-of-the-art methods. In order to obtain further performance gains in anomaly detection, we adapt pre-trained features to the target distribution. Although transfer learning methods are well established in multi-class classification problems, the one-class classification (OCC) setting is not as well explored. It turns out that naive adaptation methods, which typically work well in supervised learning, often result in catastrophic collapse (feature deterioration) and reduce performance in OCC settings. A popular OCC method, DeepSVDD, advocates using specialized architectures, but this limits the adaptation performance gain. We propose two methods for combating collapse: i) a variant of early stopping that dynamically learns the stopping iteration ii) elastic regularization inspired by continual learning. Our method, PANDA, outperforms the state-of-the-art in the OCC, outlier exposure and anomaly segmentation settings by large margins(1).

关键词： computer vision Transfer learning Supervised learning computer architecture Performance gain Feature extraction pattern recognition

来源：评论

学校读者我要写书评

暂无评论

FedFSLAR: A Federated Learning Framework for Few-shot Action recognition

FedFSLAR: A Federated Learning Framework for Few-shot Action...

引用

ieee Winter Applications and computer vision workshops (WACVW)

作者： Nguyen Anh Tu Assanali Abu Nartay Aikyn Nursultan Makhanov Min-Ho Lee Khiem Le-Huy Kok-Seng Wong Department of Computer Science School of Engineering and Digital Sciences Nazarbayev University Astana Kazakhstan College of Engineering and Computer Science VinUniversity Hanoi Viet Nam

In recent years, Federated Learning (FL) has emerged as a promising solution for many computer vision applications due to its effectiveness in handling data privacy and communication overhead. However, when applying FL to advanced and computationally heavy tasks like video-based action recognition, FL clients can struggle with the lack of annotated data and model biases, thus negatively impacting learning performance. Therefore, adopting Few-Shot Learning (FSL) is essential, where the learned model can adapt to unseen classes using limited labeled examples. Nonetheless, FSL has rarely been exploited for vision tasks under FL settings. In this paper, we develop a Federated Few-Shot Learning framework, FedFSLAR, that collaboratively learns the classification model from multiple FL clients to recognize unseen actions with a few labeled video samples. Prior works in few-shot action recognition mostly use 2D-CNNs as feature backbones and ineffectively capture the temporal correlation between video frames. To overcome this limitation and enable more robust representation, we integrate the spatiotemporal feature backbones based on 3D-CNNs into a meta-learning paradigm, i.e., ProtoNet. Accordingly, we conduct extensive experiments under practical FL settings, e.g., non-IID data, to evaluate various 3D-CNN models alongside representative FL algorithms, i.e., FedAvg and FedProx. Experimental results on benchmark datasets validate the effectiveness of our FedFSLAR framework. Remarkably, our findings indicate that combining feature backbones pre-trained on external data with the FL setting can incredibly benefit FSL. Our framework offers a viable path toward achieving notable progress in FL and FSL for action recognition tasks.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Group Whitening: Balancing Learning Efficiency and Representational Capacity

Group Whitening: Balancing Learning Efficiency and Represent...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Huang, Lei Zhou, Yi Liu, Li Zhu, Fan Shao, Ling Beihang Univ Inst Artificial Intelligence SKLSDE Beijing Peoples R China Southeast Univ MOE Key Lab Comp Network & Informat Integrat Nanjing Peoples R China Incept Inst Artificial Intelligence IIAI Abu Dhabi U Arab Emirates

ISBN: (纸本)9781665445092

Batch normalization (BN) is an important technique commonly incorporated into deep learning models to perform standardization within mini-batches. The merits of BN in improving a model's learning efficiency can be further amplified by applying whitening, while its drawbacks in estimating population statistics for inference can be avoided through group normalization (GN). This paper proposes group whitening (GW), which exploits the advantages of the whitening operation and avoids the disadvantages of normalization within mini-batches. In addition, we analyze the constraints imposed on features by normalization, and show how the batch size (group number) affects the performance of batch (group) normalized networks, from the perspective of model's representational capacity. This analysis provides theoretical guidance for applying GW in practice. Finally, we apply the proposed GW to ResNet and ResNeXt architectures and conduct experiments on the ImageNet and COCO benchmarks. Results show that GW consistently improves the performance of different architectures, with absolute gains of 1.02% similar to 1.49% in top-1 accuracy on ImageNet and 1.82% similar to 3.21% in bounding box AP on COCO.

关键词： Deep learning computer vision Analytical models Sociology computer architecture Standardization Benchmark testing

来源：评论

学校读者我要写书评

暂无评论

Sparse Auxiliary Networks for Unified Monocular Depth Prediction and Completion

Sparse Auxiliary Networks for Unified Monocular Depth Predic...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Guizilini, Vitor Ambrus, Rares Burgard, Wolfram Gaidon, Adrien Toyota Res Inst TRI Los Altos CA 94022 USA

ISBN: (纸本)9781665445092

Estimating scene geometry from data obtained with cost-effective sensors is key for robots and self-driving cars. In this paper, we study the problem of predicting dense depth from a single RGB image (monodepth) with optional sparse measurements from low-cost active depth sensors. We introduce Sparse Auxiliary Networks (SANs), a new module enabling monodepth networks to perform both the tasks of depth prediction and completion, depending on whether only RGB images or also sparse point clouds are available at inference time. First, we decouple the image and depth map encoding stages using sparse convolutions to process only the valid depth map pixels. Second, we inject this information, when available, into the skip connections of the depth prediction network, augmenting its features. Through extensive experimental analysis on one indoor (NYUv2) and two outdoor (KITTI and DDAD) benchmarks, we demonstrate that our proposed SAN architecture is able to simultaneously learn both tasks, while achieving a new state of the art in depth prediction by a significant margin.

关键词： Training Image sensors Image coding computer architecture Benchmark testing Robot sensing systems Sensors

来源：评论

学校读者我要写书评

暂无评论

i-MAE: Are Latent Representations in Masked Autoencoders Linearly Separable?

i-MAE: Are Latent Representations in Masked Autoencoders Lin...

引用

ieee computer Society conference on computer vision and pattern recognition workshops (CVPRW)

作者： Kevin Zhang Zhiqiang Shen Peking University KNQ.AI Mohamed bin Zayed University of AI

ISBN: (数字)9798350365474

ISBN: (纸本)9798350365481

Masked image modeling (MIM) has been recognized as a strong self-supervised pre-training approach in the vision domain. However, the mechanism and properties of the learned representations by such a scheme, as well as how to further enhance the representations are so far not well-explored. In this paper, we aim to explore an interactive Masked Autoencoders (i-MAE) framework to enhance the representation capability from two aspects: (1) employing a two-way image reconstruction and a latent feature reconstruction with distillation loss to learn better features; (2) proposing a semantics-enhanced sampling strategy to boost the learned semantics in MAE. Upon the proposed i-MAE architecture, we can address two critical questions to explore the behaviors of the learned representations in MAE: (1) Whether the separability of latent representations in Masked Autoencoders is helpful for model performance? We study it by forcing the input as a mixture of two images instead of one. (2) Whether we can enhance the representations in the latent feature space by controlling the degree of semantics during sampling on Masked Autoencoders? To this end, we propose a sampling strategy within a mini-batch based on the semantics of training samples to examine this aspect. Extensive experiments are conducted on CIFAR-10/100, Tiny-ImageNet and ImageNet-1K datasets to verify the observations we discovered. Furthermore, in addition to qualitatively analyzing the characteristics of the latent representations, we examine the existence of linear separability and the degree of semantics in the latent space by proposing two evaluation schemes. The surprising and consistent results across the qualitative and quantitative experiments demonstrate that i-MAE is a superior framework design for understanding MAE frameworks, as well as achieving better representational ability.

关键词： Training Measurement computer vision Image recognition conferences Semantics Aerospace electronics

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 464 465 466 467 468 469 470 471 472 473 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：