检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

5,349 篇 会议
148 册 图书
29 篇 期刊文献

馆藏范围

5,526 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

2,748 篇 工学
- 2,253 篇 计算机科学与技术...
- 926 篇 软件工程
- 478 篇 光学工程
- 399 篇 控制科学与工程
- 332 篇 机械工程
- 262 篇 仪器科学与技术
- 250 篇 信息与通信工程
- 128 篇 电气工程
- 122 篇 生物工程
- 90 篇 生物医学工程（可授...
- 70 篇 电子科学与技术（可...
- 39 篇 安全科学与工程
- 33 篇 化学工程与技术
- 31 篇 建筑学
- 27 篇 土木工程
- 25 篇 交通运输工程
- 24 篇 航空宇航科学与技...
859 篇 理学
- 526 篇 物理学
- 439 篇 数学
- 155 篇 统计学（可授理学、...
- 133 篇 生物学
- 28 篇 化学
- 22 篇 系统科学
242 篇 艺术学
- 242 篇 设计学（可授艺术学...
196 篇 管理学
- 101 篇 图书情报与档案管...
- 97 篇 管理科学与工程(可...
- 27 篇 工商管理
96 篇 医学
- 96 篇 临床医学
- 21 篇 基础医学(可授医学...
28 篇 法学
- 26 篇 社会学
8 篇 教育学
6 篇 经济学
5 篇 农学
2 篇 军事学

主题

2,165 篇 computer vision
915 篇 pattern recognit...
890 篇 conferences
639 篇 training
565 篇 cameras
410 篇 feature extracti...
373 篇 computational mo...
364 篇 image segmentati...
350 篇 visualization
338 篇 face recognition
321 篇 computer archite...
309 篇 robustness
290 篇 computer science
287 篇 shape
280 篇 humans
266 篇 object detection
234 篇 layout
196 篇 neural networks
191 篇 application soft...
191 篇 lighting

机构

26 篇 istituto italian...
24 篇 google deepmind ...
24 篇 meta ai barcelon...
23 篇 politecnico di t...
20 篇 swiss fed inst t...
19 篇 institute for co...
18 篇 university of sc...
18 篇 microsoft resear...
18 篇 swiss fed inst t...
16 篇 carnegie mellon ...
16 篇 chinese acad sci...
14 篇 tsinghua univers...
14 篇 univ sci & techn...
14 篇 carnegie mellon ...
12 篇 harbin inst tech...
12 篇 tsinghua univ pe...
12 篇 department of co...
11 篇 megvii technol p...
11 篇 computer vision ...
11 篇 computer vision ...

作者

58 篇 timofte radu
25 篇 alessio del bue
24 篇 cristian canton
23 篇 tatiana tommasi
20 篇 jordi pont-tuset
17 篇 van gool luc
17 篇 luc van gool
17 篇 horst bischof
16 篇 rita cucchiara
16 篇 radu timofte
14 篇 escalera sergio
14 篇 bischof horst
14 篇 sergio escalera
13 篇 zhigang zhu
13 篇 hugo jair escala...
12 篇 murino vittorio
12 篇 li stan z.
12 篇 chen wei-ting
12 篇 qiang ji
11 篇 cucchiara rita

语言

5,512 篇 英文
10 篇 中文
5 篇 其他

检索条件"任意字段=2006 Conference on Computer Vision and Pattern Recognition Workshops"

共 5526 条记录，以下是91-100 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

SAM-CLIP: Merging vision Foundation Models towards Semantic and Spatial Understanding

SAM-CLIP: Merging Vision Foundation Models towards Semantic ...

引用

IEEE/CVF conference on computer vision and pattern recognition (CVPR)

作者： Wang, Haoxiang Vasu, Pavan Kumar Anasosalu Faghri, Fartash Vemulapalli, Raviteja Farajtabar, Mehrdad Mehta, Sachin Rastegari, Mohammad Tuzel, Oncel Pouransari, Hadi Apple Cupertino CA 95014 USA Univ Illinois Urbana IL 61801 USA

ISBN: (纸本)9798350365474

The landscape of publicly available vision foundation models (VFMs), such as CLIP and Segment Anything Model (SAM), is expanding rapidly. VFMs are endowed with distinct capabilities stemming from their pre-training objectives. For instance, CLIP excels in semantic understanding, while SAM specializes in spatial understanding for segmentation. In this work, we introduce a simple recipe to efficiently merge VFMs into a unified model that absorbs their expertise. Our method integrates techniques of multi-task learning, continual learning, and distillation. Further, it demands significantly less computational cost compared to traditional multi-task training from scratch, and it only needs a small fraction of the pre-training datasets that were initially used to train individual models. By applying our method to SAM and CLIP, we obtain SAM-CLIP : a unified model that combines the capabilities of SAM and CLIP into a single vision transformer. Compared with deploying SAM and CLIP independently, our merged model, SAM-CLIP, reduces storage and compute costs for inference, making it well-suited for edge device applications. We show that SAM-CLIP not only retains the foundational strengths of SAM and CLIP, but also introduces synergistic functionalities, notably in zero-shot semantic segmentation, where SAM-CLIP establishes new state-of-the-art results on 5 benchmarks. It outperforms previous models that are specifically designed for this task by a large margin, including +6.8% and +5.9% mean IoU improvement on Pascal-VOC and COCO-Stuff datasets, respectively.

关键词： CLIP Foundation Model Model Merging Segmentation

来源：评论

学校读者我要写书评

暂无评论

Enhancing Traffic Safety with Parallel Dense Video Captioning for End-to-End Event Analysis

Enhancing Traffic Safety with Parallel Dense Video Captionin...

引用

IEEE/CVF conference on computer vision and pattern recognition (CVPR)

作者： Shoman, Maged Wang, Dongdong Aboah, Armstrong Abdel-Aty, Mohamed Univ Cent Florida Dept Civil Environm & Construct Engn Smart & Safe Transportat SST Lab Orlando FL 32816 USA North Dakota State Univ Dept Civil Construct & Environm Engn Fargo ND USA Univ Cent Florida Joint Appointment Dept Comp Sci Dept Civil Environm & Construct Engn Smart & Safe Transportat SST Lab Orlando FL USA

ISBN: (纸本)9798350365474

This paper introduces our solution for Track 2 in AI City Challenge 2024. The task aims to solve traffic safety description and analysis with the dataset of Woven Traffic Safety (WTS), a real-world Pedestrian-Centric Traffic Video Dataset for Fine-grained Spatial-Temporal Understanding. Our solution mainly focuses on the following points: 1) To solve dense video captioning, we leverage the framework of dense video captioning with parallel decoding (PDVC) to model visual-language sequences and generate dense caption by chapters for video. 2) Our work leverages CLIP to extract visual features to more efficiently perform cross-modality training between visual and textual representations. 3) We conduct domain-specific model adaptation to mitigate domain shift problem that poses recognition challenge in video understanding. 4) Moreover, we leverage BDD-5K captioned videos to conduct knowledge transfer for better understanding WTS videos and more accurate captioning. Our solution has yielded on the test set, achieving 6th place in the competition. The opensource code will be available at https://***/UCF-SST-Lab/AICity2024CVPRW

关键词： computer vision cross-modality learning dense video captioning domain adaptation natural language processing

来源：评论

学校读者我要写书评

暂无评论

Event-Based Eye Tracking. AIS 2024 Challenge Survey

Event-Based Eye Tracking. AIS 2024 Challenge Survey

引用

IEEE/CVF conference on computer vision and pattern recognition (CVPR)

作者： Wang, Zuowen Gao, Chang Wu, Zongwei Conde, Marcos, V Timofte, Radu Liu, Shih-Chii Chen, Qinyu Zha, Zheng-jun Zhai, Wei Han, Han Liao, Bohao Wu, Yuliang Wan, Zengyu Wang, Zhong Cao, Yang Tan, Ganchao Chen, Jinze Pei, Yan Ru Bruers, Sasskia Crouzet, Sebastien McLelland, Douglas Coenen, Oliver Zhang, Baoheng Gao, Yizhao Li, Jingyuan So, Hayden Kwok-Hay Bich, Philippe Boretti, Chiara Prono, Luciano Lica, Mircea Dinucu-Jianu, David Griu, Catalin Lin, Xiaopeng Ren, Hongwei Cheng, Bojun Zhang, Xinan Vial, Valentin Yezzi, Anthony Tsai, James Univ Zurich Inst Neuroinformat Zurich Switzerland Swiss Fed Inst Technol Zurich Switzerland Delft Univ Technol Delft Netherlands Univ Wurzburg Wurzburg Germany Leiden Univ Leiden Netherlands Univ Sci & Technol China Hefei Anhui Peoples R China Brainchip Inc Laguna Hills CA USA Univ Hong Kong Hong Kong Peoples R China Politecn Torino Turin Italy Hong Kong Univ Sci & Technol Guangzhou Guangzhou Guangdong Peoples R China Georgia Inst Technol Atlanta GA 30332 USA

ISBN: (纸本)9798350365474

This survey reviews the AIS 2024 Event-Based Eye Tracking (EET) Challenge. The task of the challenge focuses on processing eye movement recorded with event cameras and predicting the pupil center of the eye. The challenge emphasizes efficient eye tracking with event cameras to achieve good task accuracy and efficiency trade-off. During the challenge period, 38 participants registered for the Kaggle competition, and 8 teams submitted a challenge factsheet. The novel and diverse methods from the submitted factsheets are reviewed and analyzed in this survey to advance future event-based eye tracking research.

关键词： computer vision dynamic vision sensor event camera eye tracking

来源：评论

学校读者我要写书评

暂无评论

AAPL: Adding Attributes to Prompt Learning for vision-Language Models

AAPL: Adding Attributes to Prompt Learning for Vision-Langua...

引用

IEEE/CVF conference on computer vision and pattern recognition (CVPR)

作者： Kim, Gahyeon Kim, Sohee Lee, Seokju Korea Inst Energy Technol KENTECH Naju South Korea

ISBN: (纸本)9798350365474

Recent advances in large pre-trained vision-language models have demonstrated remarkable performance on zero-shot downstream tasks. Building upon this, recent studies, such as CoOp and CoCoOp, have proposed the use of prompt learning, where context within a prompt is replaced with learnable vectors, leading to significant improvements over manually crafted prompts. However, the performance improvement for unseen classes is still marginal, and to tackle this problem, data augmentation has been frequently used in traditional zero-shot learning techniques. Through our experiments, we have identified important issues in CoOp and CoCoOp: the context learned through traditional image augmentation is biased toward seen classes, negatively impacting generalization to unseen classes. To address this problem, we propose adversarial token embedding to disentangle low-level visual augmentation features from highlevel class information when inducing bias in learnable prompts. Through our novel mechanism called "Adding Attributes to Prompt Learning", AAPL, we guide the learnable context to effectively extract text features by focusing on high-level features for unseen classes. We have conducted experiments across 11 datasets, and overall, AAPL shows favorable performances compared to the existing methods in few-shot learning, zero-shot learning, cross-dataset, and domain generalization tasks.

关键词： prompt learning vision language model VLMs

来源：评论

学校读者我要写书评

暂无评论

Uncovering the Hidden Cost of Model Compression

Uncovering the Hidden Cost of Model Compression

引用

IEEE/CVF conference on computer vision and pattern recognition (CVPR)

作者： Misra, Diganta Chaudhary, Muawiz Goyal, Agam Runwal, Bharat Chen, Pin Yu Carnegie Mellon Univ Pittsburgh PA 15213 USA Landskape AI Leiden Netherlands Mila Quebec AI Inst Montreal PQ Canada Concordia Univ Montreal PQ Canada Univ Wisconsin Madison Madison WI USA IBM Res Yorktown Hts NY USA

ISBN: (纸本)9798350365474

In an age dominated by resource-intensive foundation models, the ability to efficiently adapt to downstream tasks is crucial. Visual Prompting (VP), drawing inspiration from the prompting techniques employed in Large Language Models (LLMs), has emerged as a pivotal method for transfer learning in the realm of computer vision. As the importance of efficiency continues to rise, research into model compression has become indispensable in alleviating the computational burdens associated with training and deploying over-parameterized neural networks. A primary objective in model compression is to develop sparse and/or quantized models capable of matching or even surpassing the performance of their over-parameterized, full-precision counterparts. Although previous studies have explored the effects of model compression on transfer learning, its impact on visual prompting-based transfer remains unclear. This study aims to bridge this gap, shedding light on the fact that model compression detrimentally impacts the performance of visual prompting-based transfer, particularly evident in scenarios with low data volume. Furthermore, our findings underscore the adverse influence of sparsity on the calibration of downstream visual-prompted models. However, intriguingly, we also illustrate that such negative effects on calibration are not present when models are compressed via quantization. This empirical investigation underscores the need for a nuanced understanding beyond mere accuracy in sparse and quantized settings, thereby paving the way for further exploration in Visual Prompting techniques tailored for sparse and quantized models.

关键词： Compression Quantization Sparsity vision prompting

来源：评论

学校读者我要写书评

暂无评论

Multi-Modal Fusion of Event and RGB for Monocular Depth Estimation Using a Unified Transformer-based Architecture

Multi-Modal Fusion of Event and RGB for Monocular Depth Esti...

引用

IEEE/CVF conference on computer vision and pattern recognition (CVPR)

作者： Devulapally, Anusha Khan, Md Fahim Faysal Advani, Siddharth Narayanan, Vijaykrishnan Penn State Univ University Pk PA 16802 USA Samsung Elect Amer Ridgefield Pk NJ USA

ISBN: (纸本)9798350365474

In the field of robotics and autonomous navigation, accurate pixel-level depth estimation has gained significant importance. Event cameras or dynamic vision sensors, capture asynchronous changes in brightness at the pixel level, offering benefits such as high temporal resolution, no motion blur, and a wide dynamic range. However, unlike traditional cameras that measure absolute intensity, event cameras lack the ability to provide scene context. Efficiently combining the advantages of both asynchronous events and synchronous RGB images to enhance depth estimation remains a challenge. In our study, we introduce a unified transformer that combines both event and RGB modalities to achieve precise depth prediction. In contrast to individual transformers for input modalities, a unified transformer model captures inter-modal dependencies and uses self-attention to enhance event-RGB contextual interactions. This approach exceeds the performance of recurrent neural network (RNN) methods used in state-of-the-art models. To encode the temporal information from events, convLSTMs are used before the transformer to improve depth estimation. Our proposed architecture outperforms the existing approaches in terms of absolute mean depth error, achieving state-of-the-art results in most cases. Additionally, the performance is also seen in other metrics like RMSE, absolute relative difference and depth thresholds compared to the existing approaches. The source code is available at:https://***/anusha-devulapally/ER-F2D.

关键词： Event Cameras Monocular Depth Estimation Multi-Modal Fusion vision Transformer

来源：评论

学校读者我要写书评

暂无评论

SegFormer3D: an Efficient Transformer for 3D Medical Image Segmentation

SegFormer3D: an Efficient Transformer for 3D Medical Image S...

引用

IEEE/CVF conference on computer vision and pattern recognition (CVPR)

作者： Perera, Shehan Navard, Pouyan Yilmaz, Alper Ohio State Univ Photogrammetr Comp Vis Lab Columbus OH 43210 USA

ISBN: (纸本)9798350365474

The adoption of vision Transformers (ViTs) based architectures represents a significant advancement in 3D Medical Image (MI) segmentation, surpassing traditional Convolutional Neural Network (CNN) models by enhancing global contextual understanding. While this paradigm shift has significantly enhanced 3D segmentation performance, state-of-the-art architectures require extremely large and complex architectures with large scale computing resources for training and deployment. Furthermore, in the context of limited datasets, often encountered in medical imaging, larger models can present hurdles in both model generalization and convergence. In response to these challenges and to demonstrate that lightweight models are a valuable area of research in 3D medical imaging, we present SegFormer3D, a hierarchical Transformer that calculates attention across multiscale volumetric features. Additionally, SegFormer3D avoids complex decoders and uses an all-MLP decoder to aggregate local and global attention features to produce highly accurate segmentation masks. The proposed memory efficient Transformer preserves the performance characteristics of a significantly larger model in a compact design. SegFormer3D democratizes deep learning for 3D medical image segmentation by offering a model with 33x less parameters and a 13x reduction in GFLOPS compared to the current state-of-the-art (SOTA). We benchmark SegFormer3D against the current SOTA models on three widely used datasets Synapse, BRaTs, and ACDC, achieving competitive results. Code: https://***/OSUPCVLab/***

关键词： 3D Medical Image Segmentation ACDC Attention BraTs Deep Learning Efficient Attention Segmentation Synapse Transformers vision Transformers

来源：评论

学校读者我要写书评

暂无评论

Co-designing a Sub-millisecond Latency Event-based Eye Tracking System with Submanifold Sparse CNN

Co-designing a Sub-millisecond Latency Event-based Eye Track...

引用

IEEE/CVF conference on computer vision and pattern recognition (CVPR)

作者： Zhang, Baoheng Gao, Yizhao Li, Jingyuan So, Hayden Kwok-Hay Univ Hong Kong Hong Kong Peoples R China

ISBN: (纸本)9798350365474

Eye-tracking technology is integral to numerous consumer electronics applications, particularly in the realm of virtual and augmented reality (VR/AR). These applications demand solutions that excel in three crucial aspects: low-latency, low-power consumption, and precision. Yet, achieving optimal performance across all these fronts presents a formidable challenge, necessitating a balance between sophisticated algorithms and efficient backend hardware implementations. In this study, we tackle this challenge through a synergistic software/hardware co-design of the system with an event camera. Leveraging the inherent sparsity of event-based input data, we integrate a novel sparse FPGA dataflow accelerator customized for submanifold sparse convolution neural networks (SCNN). The SCNN implemented on the accelerator can efficiently extract the embedding feature vector from each representation of event slices by only processing the non-zero activations. Subsequently, these vectors undergo further processing by a gated recurrent unit (GRU) and a fully connected layer on the host CPU to generate the eye centers. Deployment and evaluation of our system reveal outstanding performance metrics. On the Event-based Eye-Tracking-AIS2024 dataset, our system achieves 81% p5 accuracy, 99.5% p10 accuracy, and 3.71 Mean Euclidean Distance with 0.7 ms latency while only consuming 2.29 mJ per inference. Notably, our solution opens up opportunities for future eye-tracking systems. Code is available at https://***/CASRHKU/ESDA/tree/eye_tracking.

关键词： dynamic vision sensor event camera event-based vision eye tracking FPGA hardware-software codesign sparse processing

来源：评论

学校读者我要写书评

暂无评论

IrrNet: Advancing Irrigation Mapping with Incremental Patch Size Training on Remote Sensing Imagery

IrrNet: Advancing Irrigation Mapping with Incremental Patch ...

引用

IEEE/CVF conference on computer vision and pattern recognition (CVPR)

作者： Hoque, Oishee Bintey Swarup, Samarth Adiga, Abhijin Nouwakpo, Sayjro Kossi Marathe, Madhav Univ Virginia Dept Comp Sci Charlottesville VA 22903 USA Univ Virginia Biocomplex Inst Charlottesville VA 22903 USA ARS USDA Kimberly ID USA

ISBN: (纸本)9798350365474

Irrigation mapping plays a crucial role in effective water management, essential for preserving both water quality and quantity, and is key to mitigating the global issue of water scarcity. The complexity of agricultural fields, adorned with diverse irrigation practices, especially when multiple systems coexist in close quarters, poses a unique challenge. This complexity is further compounded by the nature of Landsat's remote sensing data, where each pixel is rich with densely packed information, complicating the task of accurate irrigation mapping. In this study, we introduce an innovative approach that employs a progressive training method, which strategically increases patch sizes throughout the training process, utilizing datasets from Landsat 5 and 7, labeled with the WRLU dataset for precise labeling. This initial focus allows the model to capture detailed features, progressively shifting to broader, more general features as the patch size enlarges. Remarkably, our method enhances the performance of existing state-of-the-art models by approximately 20%. Furthermore, our analysis delves into the significance of incorporating various spectral bands into the model, assessing their impact on performance. The findings reveal that additional bands are instrumental in enabling the model to discern finer details more effectively. This work sets a new standard for leveraging remote sensing imagery in irrigation mapping.

关键词： computer vision Irrigation Mapping Patch Scaling Remote Sensing Segmentation Transfer learning

来源：评论

学校读者我要写书评

暂无评论

Orientation-conditioned Facial Texture Mapping for Video-based Facial Remote Photoplethysmography Estimation

Orientation-conditioned Facial Texture Mapping for Video-bas...

引用

IEEE/CVF conference on computer vision and pattern recognition (CVPR)

作者： Cantrill, Sam Ahmedt-Aristizabal, David Petersson, Lars Suominen, Hanna Armin, Mohammad Ali Australian Natl Univ Canberra ACT Australia Commonwealth & Sci Ind Res Org Data61 Canberra ACT Australia Univ Turku Turku Finland

ISBN: (纸本)9798350365474

Camera-based remote photoplethysmography (rPPG) enables contactless measurement of important physiological signals such as pulse rate (PR). However, dynamic and unconstrained subject motion introduces significant variability into the facial appearance in video, confounding the ability of video-based methods to accurately extract the rPPG signal. In this study, we leverage the 3D facial surface to construct a novel orientation-conditioned facial texture video representation which improves the motion robustness of existing video-based facial rPPG estimation methods. Our proposed method achieves a significant 18.2% performance improvement in cross-dataset testing on MMPD over our baseline using the PhysNet model trained on PURE, highlighting the efficacy and generalization benefits of our designed video representation. We demonstrate significant performance improvements of up to 29.6% in all tested motion scenarios in cross-dataset testing on MMPD, even in the presence of dynamic and unconstrained subject motion. Emphasizing the benefits the benefits of disentangling motion through modeling the 3D facial surface for motion robust facial rPPG estimation. We validate the efficacy of our design decisions and the impact of different video processing steps through an ablation study. Our findings illustrate the potential strengths of exploiting the 3D facial surface as a general strategy for addressing dynamic and unconstrained subject motion in videos. The code is available at https://***/orientation-uv-rppg/.

关键词： computer vision motion rppg

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 6 7 8 9 10 11 12 13 14 15 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：