This paper introduces Gate-Shift-Pose, an enhanced version of Gate-Shift-Fuse networks, designed for athlete fall classification in figure skating by integrating skeleton pose data alongside RGB frames. We evaluate tw...
详细信息
ISBN:
(数字)9798331536626
ISBN:
(纸本)9798331536633
This paper introduces Gate-Shift-Pose, an enhanced version of Gate-Shift-Fuse networks, designed for athlete fall classification in figure skating by integrating skeleton pose data alongside RGB frames. We evaluate two fusion strategies: early-fusion, which combines RGB frames with Gaussian heatmaps of pose keypoints at the input stage, and latefusion, which employs a multi-stream architecture with attention mechanisms to combine RGB and pose features. Experiments on the FR-FS dataset demonstrate that Gate-Shift-Pose significantly outperforms the RGB-only baseline, improving accuracy by up to 40% with ResNet18 and 20% with ResNet50. Early-fusion achieves the highest accuracy (98.08%) with ResNet50, leveraging the model's capacity for effective multimodal integration, while latefusion is better suited for lighter backbones like ResNet18. These results highlight the potential of multimodal architectures for sports action recognition and the critical role of skeleton pose information in capturing complex motion patterns.
As cameras become ubiquitous in our living environment, invasion of privacy is becoming a significant concern. A common approach to privacy preservation is to remove personally identifiable information from a captured...
详细信息
ISBN:
(数字)9798331510831
ISBN:
(纸本)9798331510848
As cameras become ubiquitous in our living environment, invasion of privacy is becoming a significant concern. A common approach to privacy preservation is to remove personally identifiable information from a captured image, but there is a risk of the original image being leaked. In this paper, we propose a pre-capture privacy-aware imaging method that captures images from which the details of a pre-specified anonymized target have been eliminated. The proposed method applies a single-pixel imaging frame-work in which we introduce a feedback mechanism called an aperture pattern generator (APG). The introduced APG adaptively outputs the next aperture pattern to avoid sampling the anonymized target by using already acquired data as a clue. Furthermore, the anonymized target can be set to any object without changing hardware. Except for the removed detailed features of the anonymized target, the captured images are of comparable quality to those captured by a general camera and can be used for various computervision applications. We target faces and license plates and experimentally show that the proposed method can capture clear images in which detailed features of the anonymized target are eliminated, achieving both privacy and utility.
The three articles in this special section are selected papers from the ieee CS conference on computervision and patternrecognition that was held in Anchorage, AL, in June 2008.
The three articles in this special section are selected papers from the ieee CS conference on computervision and patternrecognition that was held in Anchorage, AL, in June 2008.
Identifying predictive covariates, which forecast individual treatment effectiveness, is crucial for decision-making across different disciplines such as personalized medicine. These covariates, referred to as biomark...
详细信息
The nine award-winning papers in this special section were presented at the ieeeconference on computervision and patternrecognition (CVPR 2010)that was held 13-18 June 2010 in San Francisco, CA.
The nine award-winning papers in this special section were presented at the ieeeconference on computervision and patternrecognition (CVPR 2010)that was held 13-18 June 2010 in San Francisco, CA.
We present TabGuard, a privacy-preserving framework for an end-to-end secure Table Structure recognition. Tab-Guard masks all the contents of the table locally and utilizes the masked table image for structure recogni...
详细信息
ISBN:
(数字)9798331510831
ISBN:
(纸本)9798331510848
We present TabGuard, a privacy-preserving framework for an end-to-end secure Table Structure recognition. Tab-Guard masks all the contents of the table locally and utilizes the masked table image for structure recognition. Our method is simple yet effective for detecting table cells while preserving the inherent table alignment characteristics to reconstruct tables. Our approach benefits from inductive bias, expressed through an approximated table grid which helps alleviate challenges in the detection of cells that are small or have extreme aspect ratios. Experimental results demonstrate that our solution not only establishes a new state-of-the-art on several benchmark datasets but also effectively addresses long-standing challenges associated with dense tables having complex layouts. We make our code publically available at https://***/sachinraja13/TabGuard.
Medical image segmentation tasks are often intricate and require medical domain expertise. Recent advancements in deep learning have expedited these demanding tasks, transitioning from specialized models tailored to e...
详细信息
In neural networks, recognizing visual patterns is challenging because global average pooling disregards local patterns and solely relies on over-concentrated activation. Global average pooling enforces the network to...
详细信息
ISBN:
(数字)9798331510831
ISBN:
(纸本)9798331510848
In neural networks, recognizing visual patterns is challenging because global average pooling disregards local patterns and solely relies on over-concentrated activation. Global average pooling enforces the network to learn objects regardless of their location, so features tend to be activated only in specific regions. To support this claim, we provide a novel analysis of the problems that over-concentration brings about in networks with extensive experiments. We analyze the over-concentration through problems arising from feature variance and dead neurons that are not activated. Based on our analysis, we introduce a multi-token attention pooling layer to alleviate the over-concentration problem. Our attention-pooling layer captures broad-sight local patterns by learning multiple tokens with the proposed distillation algorithm. It resolves the high bias and high variance errors of learned multi-tokens, which is crucial when aggregating local patterns with multi-tokens. Our method applies to various vision tasks and network architectures such as CNN, ViT, and MLP-Mixer. The proposed method improves baselines with few extra resources, and a network employing our pooling method works favorably against state-of-the-art networks. We open-source the code at https://***/Lab-LVM/imagenet-models.
In the field of open-set recognition, conventional models often focus on addressing challenges within a single hierarchical category, and these methods frequently lack inter-pretability. In this paper, we propose a no...
详细信息
ISBN:
(数字)9798331510831
ISBN:
(纸本)9798331510848
In the field of open-set recognition, conventional models often focus on addressing challenges within a single hierarchical category, and these methods frequently lack inter-pretability. In this paper, we propose a novel solution that utilizes attributes and hierarchical relationships to achieve interpretable open-set recognition. Our method is centered around the visual-semantic attribute space. By leveraging hierarchy division, we can decompose the attributes into more granular components, thereby yielding additional performance improvements. When confronted with an unfamiliar object, our method not only classifies it as an unknown category but also provides insights into the broader category and its associated attributes. This capability enhances interpretability by offering valuable information regarding the potential category and characteristics of the object. Experimental results demonstrate great performance improvements compared to existing methods.
暂无评论