检索结果-内蒙古大学图书馆

conference on computer vision and pattern recognition (cvpr)

Presents the title page of the proceedings record.

关键词：

来源：评论

学校读者我要写书评

暂无评论

2001 ieee computer Society conference on computer vision and pattern recognition [front matter]

2001 IEEE Computer Society Conference on Computer Vision and...

引用

conference on computer vision and pattern recognition (cvpr)

ISBN: (纸本)0769512720

conference proceedings front matter may contain various advertisements, welcome messages, committee or program information, and other miscellaneous conference information. This may in some cases also include the cover art, table of contents, copyright statements, title-page or half title-pages, blank pages, venue maps or other general information relating to the conference that was part of the original conference proceedings.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Proceedings 1999 ieee computer Society conference on computer vision and pattern recognition Vol. Two

Proceedings 1999 IEEE Computer Society Conference on Compute...

引用

conference on computer vision and pattern recognition (cvpr)

关键词：

来源：评论

学校读者我要写书评

暂无评论

Dual-stage temporal perception network for continuous sign language recognition

引用

VISUAL computer 2025年第3期41卷 1971-1986页

作者： Huang, Zhigang Xue, Wanli Zhou, Yuxi Sun, Jinlu Wu, Yazhou Yuan, Tiantian Chen, Shengyong Tianjin Univ Technol Sch Comp Sci & Engn 399 BinShuiXi Rd Tianjin 300384 Peoples R China Tiangong Univ Sch Elect & Informat Engn 399 BinShuiXi Rd Tianjin 300387 Peoples R China Tianjin Univ Technol Tech Coll Deaf 391 BinShuiXi Rd Tianjin 300384 Peoples R China Shaanxi Digital Mapping Informat Technol Co Ltd 10 ZhangBaYi Rd Xian 710075 Shaanxi Peoples R China

Continuous sign language recognition (CSLR) aims to identify a sequence of glosses from a sign language video with only a sentence-level label provided in a weakly supervised way. In sign language videos, the transitions among actions are naturally fluent, and different glosses or the same gloss correspond to video clips with various temporal scales. Obviously, these factors pose a challenge to the effective extraction of complex temporal information. However, most previous deep learning-based CSLR methods employ a temporal modeling method with a fixed temporal receptive field, which is a simple and effective solution but does not cope well with video clips that have various temporal scales. To relieve this problem, we propose a dual-stage temporal perception module (DTPM) by leveraging the strengths of both temporal convolutions and transformers, which follows a hierarchical structure with dual stages aimed at capturing richer and more comprehensive temporal features. Specifically, each stage for DTPM is cleverly composed of two parts: a multi-scale local temporal module (MS-LTM), followed by a set of global-local temporal modules (GLTMs), where each GLTM can be further decomposed into a global temporal relational module (GTRM) and a local temporal relational module (LTRM). At each stage, an MS-LTM is first employed to model multi-scale local temporal relations and then utilize a set of GLTMs to model global temporal relations and strengthen local temporal relations. We finally aggregate the output features of each stage to form a video feature representation with rich semantic information. Extensive experiments on three CSLR benchmarks, PHOENIX14 (Koller et al. Comput Vis Image Underst 141:108-125, 2015), PHOENIX14-T (Camgoz et al., in: Proceedings of the ieee conference on computer vision and pattern recognition, pp 7784-7793, 2018), and CSL (Huang et al., in: Proceedings of the AAAI conference on artificial intelligence, pp 32, 2018), validate the effectiveness

关键词： Continuous sign language recognition Temporal modeling Multi-scale local temporal relations Global temporal relations

来源：评论

学校读者我要写书评

暂无评论

U-MedSAM: Uncertainty-Aware MedSAM for Medical Image Segmentation

U-MedSAM: Uncertainty-Aware MedSAM for Medical Image Segmen...

引用

International Challenge on Segment Anything in Medical Images on Laptop held in conjunction with the ieee/CVF conference on computer vision and pattern recognition, cvpr 2024

作者： Wang, Xin Liu, Xiaoyu Huang, Peng Huang, Pu Hu, Shu Zhu, Hongtu Albany United States School of Physics and Electronics Shandong Normal University Jinan China School of Computing and Artificial Intelligence Southwest Jiaotong University Chengdu China Department of Computer and Information Technology Purdue University West Lafayette United States University of North Carolina at Chapel Hill Chapel Hill United States

ISBN: (纸本)9783031818530

Medical Image Foundation Models have proven to be powerful tools for mask prediction across various datasets. However, accurately assessing the uncertainty of their predictions remains a significant challenge. To address this, we propose a new model, U-MedSAM, which integrates the MedSAM model with an uncertainty-aware loss function and the Sharpness-Aware Minimization (SharpMin) optimizer. The uncertainty-aware loss function automatically combines region-based, distribution-based, and pixel-based loss designs to enhance segmentation accuracy and robustness. SharpMin improves generalization by finding flat minima in the loss landscape, thereby reducing overfitting. Our method was evaluated in the cvpr24 MedSAM on Laptop challenge, where U-MedSAM demonstrated promising performance. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

关键词： Medical imaging

来源：评论

学校读者我要写书评

暂无评论

Modality-Specific Strategies for Medical Image Segmentation Using Lightweight SAM Architectures

Modality-Specific Strategies for Medical Image Segmentation...

引用

International Challenge on Segment Anything in Medical Images on Laptop held in conjunction with the ieee/CVF conference on computer vision and pattern recognition, cvpr 2024

作者： Dao, Thuy Ye, Xincheng Scarsbrook, Joshua Balarupan, Gowrienanthan Ribeiro, Fernanda L. Bollmann, Steffen School of Electrical Engineering and Computer Science University of Queensland Brisbane Australia Queensland Digital Health Centre University of Queensland Brisbane Australia

ISBN: (纸本)9783031818530

Medical image segmentation tasks are often intricate and require medical domain expertise. Recent advancements in deep learning have expedited these demanding tasks, transitioning from specialized models tailored to each task to versatile foundation models capable of accommodating various image modalities. However, many of these foundation models are optimized for GPU computation, necessitating significant computational resources and constraining their practical utility in clinical settings. Furthermore, their variable accuracy across modalities and novel domains undermines their reliability in clinical practice. To address these limitations, we undertake a comparative investigation into deploying medical image segmentation models on CPU, focusing on accuracy and runtime efficiency, as part of the "cvpr 2024: Segment Anything In Medical Images On Laptop" challenge. Our methodology employs different models customized for each modality, including pre-trained EfficientViT-SAM and LiteMedSAM to yield the most precise and efficient outcomes. Additionally, to bolster model performance for datasets featuring small regions of interest, such as PET scans, we integrate a majority voting mechanism. We optimize runtime using the OpenVINO format within a C++ inference script. This approach improves inference runtime while maintaining competitive accuracy, achieving an average DSC score of 0.86 on the validation set and 0.75 on the testing set with an average runtime of 4.61 s on the testing set. Notably, given that most modalities are evaluated in a zero-shot manner, our findings suggest that the zero-shot capability of foundation models can be further refined through dataset-specific inference strategies. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

关键词： C++ (programming language)

来源：评论

学校读者我要写书评

暂无评论

FastTextSpotter: A High-Efficiency Transformer for Multilingual Scene Text Spotting 27th

FastTextSpotter: A High-Efficiency Transformer for Multilin...

引用

27th International conference on pattern recognition, ICPR 2024

作者： Das, Alloy Biswas, Sanket Pal, Umapada Lladós, Josep Bhattacharya, Saumik CVPR Unit Indian Statistical Institute Kolkata Kolkata India Computer Vision Center Universitat Autónoma de Barcelona Bellaterra Spain ECE Indian Institute of Technology Kharagpur Kharagpur India

ISBN: (纸本)9783031784972

The proliferation of scene text in both structured and unstructured environments presents significant challenges in optical character recognition (OCR), necessitating more efficient and robust text spotting solutions. This paper presents FastTextSpotter, a framework that integrates a Swin Transformer visual backbone with a Transformer Encoder-Decoder architecture, enhanced by a novel, faster self-attention unit, SAC2, to improve processing speeds while maintaining accuracy. FastTextSpotter has been validated across multiple datasets, including ICDAR2015 for regular texts and CTW1500 and TotalText for arbitrary-shaped texts, benchmarking against current state-of-the-art models. Our results indicate that FastTextSpotter not only achieves superior accuracy in detecting and recognizing multilingual scene text (English and Vietnamese) but also improves model efficiency, thereby setting new benchmarks in the field. This study underscores the potential of advanced transformer architectures in improving the adaptability and speed of text spotting applications in diverse real-world settings. The dataset, code, and pre-trained models have been released in our Github. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

关键词： Optical character recognition

来源：评论

学校读者我要写书评

暂无评论

Segment Anything in Medical Images with nnUNet

Segment Anything in Medical Images with nnUNet

引用

International Challenge on Segment Anything in Medical Images on Laptop held in conjunction with the ieee/CVF conference on computer vision and pattern recognition, cvpr 2024

作者： Stock, Raphael Kirchhoff, Yannick Rokuss, Maximilian R. Ravindran, Ashis Maier-Hein, Klaus Heidelberg Germany Faculty of Mathematics and Computer Science Heidelberg University Heidelberg Germany HIDSS4Health - Helmholtz Information and Data Science School for Health Karlsruhe Germany HIDSS4Health - Helmholtz Information and Data Science School for Health Heidelberg Germany Pattern Analysis and Learning Group Department of Radiation Oncology Heidelberg University Hospital Heidelberg Germany

ISBN: (纸本)9783031818530

In this paper, we present an enhanced medical image segmentation approach leveraging the nnUNet framework, specifically tailored to integrate bounding box prompts for improved segmentation accuracy in resource-constrained environments. By incorporating these prompts as binary masks in an additional input channel, we enable more precise and context-aware segmentation. Our methodology employs a 2D slice-wise approach optimized for CPU-based inference through just-in-time (JIT) compiled functions, ensuring efficient processing on standard clinical equipment. Our solution demonstrates robust performance, achieving an average Dice Similarity Coefficient (DSC) of 80.98% and a Normalized Surface Dice (NSD) of 83.23% across multiple modalities in the validation set. This indicates its practical applicability and effectiveness in real-world clinical settings, where computational resources may be limited. By focusing on both accuracy and efficiency, our approach makes advanced segmentation technology accessible to a broader range of healthcare providers, facilitating enhanced clinical decision-making and patient care. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

关键词： Image segmentation

来源：评论

学校读者我要写书评

暂无评论

Guest editors' introduction to the special section on cvpr papers

引用

ieee TRANSACTIONS ON pattern ANALYSIS AND MACHINE INTELLIGENCE 2008年第10期30卷 1681-1682页

作者： Baker, Simon Matas, Jiri Zabih, Ramin Microsoft Res Redmond WA 98052 USA Czech Tech Univ Fac Elect Engn Dept Cybernet Ctr Machine Percept Prague 16627 6 Czech Republic Cornell Univ Dept Comp Sci Ithaca NY 14583 USA

The four papers in this special section are extended versions of award-winning papers from the 2007 ieee conference on computer vision and pattern recognition (cvpr 2007).

关键词： computer vision Awards committees computer science Face detection Mathematics Educational institutions Biological system modeling pattern recognition Object detection Cameras

来源：评论

学校读者我要写书评

暂无评论

A Met ric Parametrization for Trifocal Tensors with Non-Colinear Pinholes

A Met ric Parametrization for Trifocal Tensors with Non-Coli...

引用

ieee conference on computer vision and pattern recognition (cvpr)

作者： Leonardos, Spyridon Tron, Roberto Daniilidis, Kostas Univ Penn Grasp Lab Philadelphia PA 19104 USA

ISBN: (纸本)9781467369640

The trifocal tensor, which describes the relation between projections of points and lines in three views, is a fundamental entity of geometric computer vision. In this work, we investigate a new parametrization of the trifocal tensor for calibrated cameras with non-colinear pinholes obtained from a quotient Riemannian manifold. We incorporate this formulation into state-of-the art methods for optimization on manifolds, and show, through experiments in pose averaging, that it produces a meaningful way to measure distances between trifocal tensors.

关键词： trifocal Tensor pinholes Manifolds Pores parameterization images computer vision

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：