conference proceedings front matter may contain various advertisements, welcome messages, committee or program information, and other miscellaneous conference information. This may in some cases also include the cover...
ISBN:
(纸本)0769512720
conference proceedings front matter may contain various advertisements, welcome messages, committee or program information, and other miscellaneous conference information. This may in some cases also include the cover art, table of contents, copyright statements, title-page or half title-pages, blank pages, venue maps or other general information relating to the conference that was part of the original conference proceedings.
conference proceedings front matter may contain various advertisements, welcome messages, committee or program information, and other miscellaneous conference information. This may in some cases also include the cover...
conference proceedings front matter may contain various advertisements, welcome messages, committee or program information, and other miscellaneous conference information. This may in some cases also include the cover art, table of contents, copyright statements, title-page or half title-pages, blank pages, venue maps or other general information relating to the conference that was part of the original conference proceedings.
Continuous sign language recognition (CSLR) aims to identify a sequence of glosses from a sign language video with only a sentence-level label provided in a weakly supervised way. In sign language videos, the transiti...
详细信息
Continuous sign language recognition (CSLR) aims to identify a sequence of glosses from a sign language video with only a sentence-level label provided in a weakly supervised way. In sign language videos, the transitions among actions are naturally fluent, and different glosses or the same gloss correspond to video clips with various temporal scales. Obviously, these factors pose a challenge to the effective extraction of complex temporal information. However, most previous deep learning-based CSLR methods employ a temporal modeling method with a fixed temporal receptive field, which is a simple and effective solution but does not cope well with video clips that have various temporal scales. To relieve this problem, we propose a dual-stage temporal perception module (DTPM) by leveraging the strengths of both temporal convolutions and transformers, which follows a hierarchical structure with dual stages aimed at capturing richer and more comprehensive temporal features. Specifically, each stage for DTPM is cleverly composed of two parts: a multi-scale local temporal module (MS-LTM), followed by a set of global-local temporal modules (GLTMs), where each GLTM can be further decomposed into a global temporal relational module (GTRM) and a local temporal relational module (LTRM). At each stage, an MS-LTM is first employed to model multi-scale local temporal relations and then utilize a set of GLTMs to model global temporal relations and strengthen local temporal relations. We finally aggregate the output features of each stage to form a video feature representation with rich semantic information. Extensive experiments on three CSLR benchmarks, PHOENIX14 (Koller et al. Comput Vis Image Underst 141:108-125, 2015), PHOENIX14-T (Camgoz et al., in: Proceedings of the ieee conference on computer vision and pattern recognition, pp 7784-7793, 2018), and CSL (Huang et al., in: Proceedings of the AAAI conference on artificial intelligence, pp 32, 2018), validate the effectiveness
Medical Image Foundation Models have proven to be powerful tools for mask prediction across various datasets. However, accurately assessing the uncertainty of their predictions remains a significant challenge. To addr...
详细信息
Medical image segmentation tasks are often intricate and require medical domain expertise. Recent advancements in deep learning have expedited these demanding tasks, transitioning from specialized models tailored to e...
详细信息
The proliferation of scene text in both structured and unstructured environments presents significant challenges in optical character recognition (OCR), necessitating more efficient and robust text spotting solutions....
详细信息
In this paper, we present an enhanced medical image segmentation approach leveraging the nnUNet framework, specifically tailored to integrate bounding box prompts for improved segmentation accuracy in resource-constra...
详细信息
The four papers in this special section are extended versions of award-winning papers from the 2007 ieee conference on computer vision and pattern recognition (cvpr 2007).
The four papers in this special section are extended versions of award-winning papers from the 2007 ieee conference on computer vision and pattern recognition (cvpr 2007).
The trifocal tensor, which describes the relation between projections of points and lines in three views, is a fundamental entity of geometric computervision. In this work, we investigate a new parametrization of the...
详细信息
ISBN:
(纸本)9781467369640
The trifocal tensor, which describes the relation between projections of points and lines in three views, is a fundamental entity of geometric computervision. In this work, we investigate a new parametrization of the trifocal tensor for calibrated cameras with non-colinear pinholes obtained from a quotient Riemannian manifold. We incorporate this formulation into state-of-the art methods for optimization on manifolds, and show, through experiments in pose averaging, that it produces a meaningful way to measure distances between trifocal tensors.
暂无评论