检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

7,181 篇 会议
27 篇 期刊文献
4 册 图书

馆藏范围

7,211 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

4,396 篇 工学
- 4,002 篇 计算机科学与技术...
- 1,810 篇 软件工程
- 869 篇 光学工程
- 401 篇 控制科学与工程
- 388 篇 机械工程
- 375 篇 信息与通信工程
- 222 篇 仪器科学与技术
- 203 篇 电气工程
- 125 篇 生物医学工程（可授...
- 111 篇 生物工程
- 100 篇 电子科学与技术（可...
- 45 篇 化学工程与技术
- 42 篇 建筑学
- 42 篇 安全科学与工程
- 38 篇 土木工程
- 35 篇 力学（可授工学、理...
- 35 篇 航空宇航科学与技...
- 30 篇 交通运输工程
1,816 篇 理学
- 1,159 篇 数学
- 1,046 篇 物理学
- 406 篇 统计学（可授理学、...
- 178 篇 生物学
- 48 篇 系统科学
- 45 篇 化学
225 篇 医学
- 224 篇 临床医学
220 篇 管理学
- 166 篇 图书情报与档案管...
- 58 篇 管理科学与工程(可...
- 32 篇 工商管理
151 篇 艺术学
- 151 篇 设计学（可授艺术学...
30 篇 法学
- 29 篇 社会学
24 篇 农学
10 篇 教育学
8 篇 经济学
2 篇 文学
2 篇 军事学

主题

2,406 篇 computer vision
846 篇 pattern recognit...
694 篇 cameras
658 篇 computer science
653 篇 face recognition
594 篇 layout
543 篇 image segmentati...
516 篇 conferences
514 篇 shape
475 篇 object recogniti...
467 篇 robustness
424 篇 humans
371 篇 feature extracti...
340 篇 object detection
318 篇 training
282 篇 application soft...
280 篇 image recognitio...
265 篇 lighting
245 篇 computational mo...
239 篇 image reconstruc...

机构

41 篇 microsoft resear...
26 篇 department of co...
24 篇 school of comput...
24 篇 institute for co...
21 篇 swiss fed inst t...
20 篇 swiss fed inst t...
20 篇 carnegie mellon ...
20 篇 department of co...
18 篇 department of co...
18 篇 school of comput...
17 篇 department of in...
17 篇 the robotics ins...
17 篇 institute of com...
16 篇 univ sci & techn...
16 篇 department of el...
16 篇 robotics institu...
15 篇 national laborat...
15 篇 computer vision ...
15 篇 tsinghua univ pe...
15 篇 school of comput...

作者

39 篇 timofte radu
28 篇 s.k. nayar
27 篇 huang thomas s.
23 篇 xiaoou tang
23 篇 bischof horst
22 篇 van gool luc
22 篇 t. kanade
20 篇 t.s. huang
19 篇 t. darrell
19 篇 jain anil k.
18 篇 nayar shree k.
18 篇 torralba antonio
18 篇 chellappa rama
17 篇 a.k. jain
17 篇 a. zisserman
17 篇 zisserman andrew
16 篇 zhang lei
16 篇 g. healey
16 篇 heung-yeung shum
16 篇 yan shuicheng

语言

7,155 篇 英文
56 篇 中文
1 篇 其他

检索条件"任意字段=2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2010"

共 7212 条记录，以下是91-100 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

VMCML: Video and Music Matching via Cross-Modality Lifting

VMCML: Video and Music Matching via Cross-Modality Lifting

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Lee, Yi-Shan Tseng, Wei-Cheng Wang, Fu-En Sun, Min Natl Tsing Hua Univ Hsinchu Taiwan Univ Toronto Toronto ON Canada Vector Inst Toronto ON Canada

ISBN: (纸本)9798350365474

We propose a content-based system for matching video and background music. The system aims to address the challenges in music recommendation for new users or new music give short-form videos. To this end, we propose a cross-modal framework VMCML (Video and Music Matching via Cross-Modality Lifting) that finds a shared embedding space between video and music representations. To ensure the embedding space can be effectively shared by both representations, we leverage CosFace loss based on margin-based cosine similarity loss. Furthermore, to confirm the music is not the original sound of the video and that more than one video is matched to the same music, we follow the rule and collect videos and music from a well-known multi-media platform. That is because there are limitations of previous datasets. We establish a large-scale dataset called MSV, which provide 390 individual music and the corresponding matched 150,000 videos. We conduct extensive experiments on Youtube-8M and our MSV datasets. Our quantitative and qualitative results demonstrate the effectiveness of our proposed framework and achieve state-of-the-art video and music matching performance.

关键词： computer music

来源：评论

学校读者我要写书评

暂无评论

De-noised vision-language Fusion Guided by Visual Cues for E-commerce Product Search

De-noised Vision-language Fusion Guided by Visual Cues for E...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Hu, Zhizhang Li, Shasha Du, Ming Dhua, Arnab Gray, Douglas Univ Calif Merced Merced CA 95343 USA Amazon Visual Search & AR Seattle WA USA Amazon Seattle WA USA

ISBN: (纸本)9798350365474

In e-commerce applications, vision-language multimodal transformer models play a pivotal role in product search. The key to successfully training a multimodal model lies in the alignment quality of image-text pairs in the dataset. However, the data in practice is often automatically collected with minimal manual intervention. Hence the alignment of image-text pairs is far from ideal. In e-commerce, this misalignment can stem from noisy and redundant non-visual-descriptive text attributes in the product description. To address this, we introduce the MultiModal alignment-guided Learned Token Pruning (MM-LTP) method. MM-LTP employs token pruning, conventionally used for computational efficiency, to perform online text cleaning during multimodal model training. By enabling the model to discern and discard unimportant tokens, it is able to train with implicitly cleaned image-text pairs. We evaluate MM-LTP using a benchmark multimodal e-commerce dataset comprising over 710,000 unique Amazon products. Our evaluation hinges on visual search, a prevalent e-commerce feature. Through MM-LTP, we demonstrate that refining text tokens enhances the paired image branch's training, which leads to significantly improved visual search performance.

关键词： De-noised Fusion Multimodal Learning Token Pruning vision-language Model Visual Search

来源：评论

学校读者我要写书评

暂无评论

Scaling Graph Convolutions for Mobile vision

Scaling Graph Convolutions for Mobile Vision

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Avery, William Munir, Mustafa Marculescu, Radu Univ Texas Austin Austin TX 78712 USA

ISBN: (纸本)9798350365474

To compete with existing mobile architectures, MobileViG introduces Sparse vision Graph Attention (SVGA), a fast token-mixing operator based on the principles of GNNs. However, MobileViG scales poorly with model size, falling at most 1% behind models with similar latency. This paper introduces Mobile Graph Convolution (MGC), a new vision graph neural network (ViG) module that solves this scaling problem. Our proposed mobile vision architecture, Mobile-ViGv2, uses MGC to demonstrate the effectiveness of our approach. MGC improves on SVGA by increasing graph sparsity and introducing conditional positional encodings to the graph operation. Our smallest model, MobileViGv2-Ti, achieves a 77.7% top-1 accuracy on ImageNet-1K, 2% higher than MobileViG-Ti, with 0.9 ms inference latency on the iPhone 13 Mini NPU. Our largest model, MobileViGv2-B, achieves an 83.4% top-1 accuracy, 0.8% higher than MobileViG-B, with 2.7 ms inference latency. Besides image classification, we show that MobileViGv2 generalizes well to other tasks. For object detection and instance segmentation on MS COCO 2017, MobileViGv2-M outperforms MobileViG-M by 1.2 AP(box) and 0.7 AP(mask), and MobileViGv2-B outperforms MobileViG-B by 1.0 AP(box) and 0.7 APmask. For semantic segmentation on ADE20K, MobileViGv2-M achieves 42.9% mIoU and MobileViGv2-B achieves 44.3% mIoU (1).

关键词： computer vision Deep Learning Edge AI Graph Neural Networks

来源：评论

学校读者我要写书评

暂无评论

Automatic recognition of Food Ingestion Environment from the AIM-2 Wearable Sensor

Automatic Recognition of Food Ingestion Environment from the...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Huang, Yuning Hassan, M. A. He, Jiangpeng Higgins, J. McCrory, Megan Eicher-Miller, Heather Thomas, J. Graham Sazonov, Edward Zhu, Fengqing Purdue Univ W Lafayette IN 47907 USA Univ Calif Davis Davis CA 95616 USA Univ Colorado Aurora CO USA Boston Univ Boston MA 02215 USA Brown Univ Providence RI 02912 USA Univ Alabama Tuscaloosa AL USA

ISBN: (纸本)9798350365474

Detecting an ingestion environment is an important aspect of monitoring dietary intake. It provides insightful information for dietary assessment. However, it is a challenging problem where human-based reviewing can be tedious, and algorithm-based review suffers from data imbalance and perceptual aliasing problems. To address these issues, we propose a neural network-based method with a two-stage training framework that tactfully combines fine-tuning and transfer learning techniques. Our method is evaluated on a newly collected dataset called "UA Free Living Study", which uses an egocentric wearable camera, AIM-2 sensor, to simulate food consumption in free-living conditions. The proposed training framework is applied to common neural network backbones, combined with approaches in the general imbalanced classification field. Experimental results on the collected dataset show that our proposed method for automatic ingestion environment recognition successfully addresses the challenging data imbalance problem in the dataset and achieves a promising overall classification accuracy of 96.63%.

关键词： Classification Dietary Assessment Scene recognition Transfer Learning

来源：评论

学校读者我要写书评

暂无评论

Emotic Masked Autoencoder on Dual-views with Attention Fusion for Facial Expression recognition

Emotic Masked Autoencoder on Dual-views with Attention Fusio...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Xuan-Bach Nguyen Hoang-Thien Nguyen Thanh-Huy Nguyen Nhu-Tai Do Quang Vinh Dinh Ho Chi Minh City Univ Technol Ho Chi Minh City Vietnam Posts & Telecommun Inst Technol Ho Chi Minh City Vietnam Ho Chi Minh City Univ Educ Ho Chi Minh City Vietnam Univ Econ Ho Chi Minh City UEH Vietnam Ho Chi Minh City Vietnam Vietnamese German Univ Binh Duong Vietnam

ISBN: (纸本)9798350365474

Facial Expression recognition (FER) is a critical task within computer vision with diverse applications across various domains. Addressing the challenge of limited FER datasets, which hampers the generalization capability of expression recognition models, is imperative for enhancing performance. Our paper presents an innovative approach integrating the MAE-Face self-supervised learning (SSL) method and multi-view Fusion Attention mechanism for expression classification, particularly showcased in the 6th Affective Behavior Analysis in-the-wild (ABAW) competition. By utilizing low-level feature information from the ipsilateral view (auxiliary view) before learning the highlevel feature that emphasizes the shift in the human facial expression, our work seeks to provide a straightforward yet innovative way to improve the examined view (main view). We also suggest easy-to-implement and no-training frameworks aimed at highlighting key facial features to determine if such features can serve as guides for the model, focusing on pivotal local elements. The efficacy of this method is validated by improvements in model performance on the Aff-wild2 dataset, as observed in both training and validation contexts.

关键词： Self-supervised learning

来源：评论

学校读者我要写书评

暂无评论

Interpreting COVID Lateral Flow Tests' Results with Foundation Models

Interpreting COVID Lateral Flow Tests' Results with Foundati...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Pandey, Stuti Myers-Dean, Josh Reynolds, Jarek Gurari, Danna Univ Colorado Boulder CO 80309 USA Univ Texas Austin Austin TX 78712 USA

ISBN: (纸本)9798350365474

Lateral flow tests (LFTs) enable rapid, low-cost testing for health conditions including Covid, pregnancy, HIV, and malaria. Automated readers of LFT results can yield many benefits including empowering blind people to independently learn about their health and accelerating data entry for large-scale monitoring (e.g., for pandemics such as Covid) by using only a single photograph per LFT test. Accordingly, we explore the abilities of modern foundation vision language models (VLMs) in interpreting such tests. To enable this analysis, we first create a new labeled dataset with hierarchical segmentations of each LFT test and its nested test result window. We call this dataset LFT-Grounding. Next, we benchmark eight modern VLMs in zero-shot settings for analyzing these images. We demonstrate that current VLMs frequently fail to correctly identify the type of LFT test, interpret the test results, locate the nested result window of the LFT tests, and recognize LFT tests when they partially obfuscated. To facilitate community-wide progress towards automated LFT reading, we publicly release our dataset at https://***/ lft_grounding_foundation_models/

关键词： Accessibility Foundation vision Language Models Lateral Flow Test Prompt Engineering Zero-Shot

来源：评论

学校读者我要写书评

暂无评论

QAttn: Efficient GPU Kernels for mixed-precision vision Transformers

QAttn: Efficient GPU Kernels for mixed-precision Vision Tran...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Kluska, Piotr Castello, Adrian Scheidegger, Florian Malossi, A. Cristiano I. Quintana-Orti, Enrique S. IBM Res Europe Ruschlikon Switzerland Univ Politecn Valencia Valencia Spain

ISBN: (纸本)9798350365474

vision Transformers have demonstrated outstanding performance in computer vision tasks. Nevertheless, this superior performance for large models comes at the expense of increasing memory usage for storing the parameters and intermediate activations. To accelerate model inference, in this work we develop and evaluate integer and mixed-precision kernels in Triton for the efficient execution of two fundamental building blocks of transformers -linear layer and attention- on graphics processing units (GPUs). On an NVIDIA A100 GPU, our kernel implementations of vision Transformers achieve a throughput speedup of up to 7x compared with reference kernels in PyTorch floating-point single precision (FP32). Additionally, the accuracy for the ViT Large model top-1 drops by less than one percent on the ImageNet1K classification task. We also observe up to 6x increased throughput by applying our kernels to the Segment Anything Model image encoder while keeping the mIOU close to the FP32 reference on the COCO2017 dataset for static and dynamic quantization. Furthermore, our kernels demonstrate improved speed to the TensorRT INT8 linear layer, and we improve the throughput of base FP16 (half precision) Triton attention on average by up to 19 +/- 4.01%. We have open-sourced the QAtnn framework, which is tightly integrated with the PyTorch quantization workflow https://***/IBM/qattn.

关键词： compression instance segmentation object classification quantization vision transformers

来源：评论

学校读者我要写书评

暂无评论

Pseudo-label based unsupervised fine-tuning of a monocular 3D pose estimation model for sports motions

Pseudo-label based unsupervised fine-tuning of a monocular 3...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Suzuki, Tomohiro Tanaka, Ryota Takeda, Kazuya Fujii, Keisuke Nagoya Univ Nagoya Aichi Japan

ISBN: (纸本)9798350365474

Accurate motion capture is useful for sports motion analysis, but requires higher acquisition costs. Monocular or few camera multi-view pose estimation provides an accessible but less accurate alternative, especially for sports motion, due to training on datasets of daily activities. In addition, multi-view estimation is still costly due to camera calibration. Therefore, it is desirable to develop an accurate and cost-effective motion capture system for the daily training in sports. In this paper, we propose an accurate and convenient sports motion capture system based on unsupervised fine-tuning. The proposed system estimates 3D joint positions by multi-view estimation based on automatic calibration with the human body. These results are used as pseudo-labels for fine-tuning of the recent higher performance monocular 3D pose estimation model. Since the fine-tuning improves the model accuracy for sports motion, we can choose multi-view or monocular estimation depending on the situation. We evaluated the system using a running motion dataset and ASPset-510, and showed that fine-tuning improved the performance of monocular estimation to the same level as that of multi-view estimation for running motion. Our proposed system can be useful for the daily motion analysis in sports.

关键词： computer vision Pose estimation Running Sports

来源：评论

学校读者我要写书评

暂无评论

Knowledge Distillation for Efficient Instance Semantic Segmentation with Transformers

Knowledge Distillation for Efficient Instance Semantic Segme...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Li, Maohui Halstead, Michael McCool, Chris Univ Bonn Bonn Germany Lamarr Inst Machine Learning & Artificial Intelli Dortmund Germany

ISBN: (纸本)9798350365474

Instance-based semantic segmentation provides detailed per-pixel scene understanding information crucial for both computer vision and robotics applications. However, state-of-the-art approaches such as Mask2Former are computationally expensive and reducing this computational burden while maintaining high accuracy remains challenging. Knowledge distillation has been regarded as a potential way to compress neural networks, but to date limited work has explored how to apply this to distill information from the output queries of a model such as Mask2Former. In this paper, we match the output queries of the student and teacher models to enable a query-based knowledge distillation scheme. We independently match the teacher and the student to the groundtruth and use this to define the teacher to student relationship for knowledge distillation. Using this approach we show that it is possible to perform knowledge distillation where the student models can have a lower number of queries and the backbone can be changed from a Transformer architecture to a convolutional neural network architecture. Experiments on two challenging agricultural datasets, sweet pepper (BUP20) and sugar beet (SB20), and Cityscapes demonstrate the efficacy of our approach. Across the three datasets the student models obtain an average absolute performance improvement in AP of 1.8 and 1.9 points for ResNet-50 and Swin-Tiny backbone respectively. To the best of our knowledge, this is the first work to propose knowledge distillation schemes for instance semantic segmentation with transformer-based models.

关键词： computer vision for Agriculture Automation Knowledge Distillation Efficient Instance Segmentation Transformercomputer vision for Agriculture Automation Knowledge Distillation Efficient Instance Segmentation Transformer

来源：评论

学校读者我要写书评

暂无评论

Potential Risk Localization via Weak Labeling out of Blind Spot

Potential Risk Localization via Weak Labeling out of Blind S...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Shimomura, Kota Hirakawa, Tsubasa Yamashita, Takayoshi Fujiyoshi, Hironobu Chubu Univ Kasugai Aichi Japan

ISBN: (纸本)9798350365474

Achieving fully autonomous driving requires not only understanding the current surrounding conditions but also predicting how objects that could lead to potential risks may change in the future. Predicting potential risk regions, especially where pedestrians or vehicles might suddenly appear, is crucial for safe autonomous driving and accident avoidance. Constructing datasets annotated with potential risk regions is costly. Therefore, conventional methods have proposed blind spot estimation using depth maps or segmentation masks through automatic labeling. However, these methods are limited in applicability due to their reliance on camera parameters or point clouds. In this study, we propose a method to automatically generate labels from depth maps and segmentation masks and estimate potential risk regions in 2D. Our automatic labeling algorithm relies solely on images, making it applicable to all onboard camera datasets. To demonstrate the effectiveness of our approach, we define regions where pedestrians or vehicles might emerge from blind spots as potential risk regions and annotate them to create a new dataset extended with potential risk region annotations. Experiments using the Cityscapes Dataset show that weakly training with labels generated by our proposed method achieves equal or superior accuracy compared with supervised training with manually annotated ground truth (GT). Furthermore, experiments using the Mapillary Vistas Dataset and BDD100K Dataset demonstrate the versatility of our approach.

关键词： Advanced Driver-Assistance Systems Autonomous Driving computer vision Deep Learning

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 6 7 8 9 10 11 12 13 14 15 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：