Chatbots are software typically embedded in Web and Mobile applications designed to assist the user in a plethora of activities, from chit-chatting to task completion. They enable diverse forms of interactions, like t...
详细信息
This paper introduces Gate-Shift-Pose, an enhanced version of Gate-Shift-Fuse networks, designed for athlete fall classification in figure skating by integrating skeleton pose data alongside RGB frames. We evaluate tw...
详细信息
ISBN:
(数字)9798331536626
ISBN:
(纸本)9798331536633
This paper introduces Gate-Shift-Pose, an enhanced version of Gate-Shift-Fuse networks, designed for athlete fall classification in figure skating by integrating skeleton pose data alongside RGB frames. We evaluate two fusion strategies: early-fusion, which combines RGB frames with Gaussian heatmaps of pose keypoints at the input stage, and latefusion, which employs a multi-stream architecture with attention mechanisms to combine RGB and pose features. Experiments on the FR-FS dataset demonstrate that Gate-Shift-Pose significantly outperforms the RGB-only baseline, improving accuracy by up to 40% with ResNet18 and 20% with ResNet50. Early-fusion achieves the highest accuracy (98.08%) with ResNet50, leveraging the model's capacity for effective multimodal integration, while latefusion is better suited for lighter backbones like ResNet18. These results highlight the potential of multimodal architectures for sports action recognition and the critical role of skeleton pose information in capturing complex motion patterns.
The proceedings contain 156 papers. The topics discussed include: real-time mobile food recognition system;style finder: fine-grained clothing style detection and retrieval;stereo camera tracking for mobile devices;to...
ISBN:
(纸本)9780769549903
The proceedings contain 156 papers. The topics discussed include: real-time mobile food recognition system;style finder: fine-grained clothing style detection and retrieval;stereo camera tracking for mobile devices;towards auto-calibration of smart phones using orientation sensors;detection of moving objects with non-stationary cameras in 5.8ms: bringing motion detection to your mobile device;mobile video capture of multi-page documents;collision detection for visually impaired from a body-mounted camera;video demo: an egocentric vision based assistive co-robot;mobile exergames - burn calories while playing games on a smartphone;a mobile vision system for fast and accurate ellipse detection;stabilization of magnified videos on a mobile device for visually impaired;and an augmented linear discriminant analysis approach for identifying identical twins with the aid of facial asymmetry features.
The proceedings contain 802 papers. The topics discussed include: X-VARS: introducing explainability in football refereeing with multi-modal large language models;a hybrid ANN-SNN architecture for low-power and low-la...
ISBN:
(纸本)9798350365474
The proceedings contain 802 papers. The topics discussed include: X-VARS: introducing explainability in football refereeing with multi-modal large language models;a hybrid ANN-SNN architecture for low-power and low-latency visual perception;pseudo-label based unsupervised fine-tuning of a monocular 3D pose estimation model for sports motions;towards efficient audio-visual learners via empowering pre-trained vision transformers with cross-modal adaptation;a dual-mode approach for vision-based navigation in a lunar landing scenario;class similarity transition: decoupling class similarities and imbalance from generalized few-shot segmentation;ReweightOOD: loss reweighting for distance-based OOD detection;Hinge-Wasserstein: estimating multimodal aleatoric uncertainty in regression tasks;and ConPro: learning severity representation for medical images using contrastive learning and preference optimization.
This study explores prompt engineering for automated white-box integration testing of RESTful APIs using Large Language Models (LLMs). Four versions of prompts were designed and tested across three OpenAI models (GPT-...
详细信息
The proceedings contain 355 papers. The topics discussed include: MultiNet++: multi-stream feature aggregation and geometric loss strategy for multi-task learning;privacy-preserving action recognition using coded aper...
ISBN:
(纸本)9781728125060
The proceedings contain 355 papers. The topics discussed include: MultiNet++: multi-stream feature aggregation and geometric loss strategy for multi-task learning;privacy-preserving action recognition using coded aperture videos;evading face recognition via partial tampering of faces;privacy-preserving annotation of face images through attribute-preserving face synthesis;towards deep neural network training on encrypted data;fooling automated surveillance cameras: adversarial patches to attack person detection;anonymousnet: natural face de-identification with measurable privacy;regularizer to mitigate gradient masking effect during single-step adversarial training;privacy preserving group membership verification and identification;defending against adversarial attacks using random forest;intersection to overpass: instance segmentation on filamentous structures with an orientation-aware neural network and terminus pairing algorithm;and surface parameterization and registration for statistical multiscale atlasing of organ development.
The following topics are dealt with: edge and boundary analysis;vision systems;motion;shape and 2-D description;stereo and 3-D description;patternrecognition;3-D models;architectures;vision models and texture;image s...
详细信息
ISBN:
(纸本)0818606339
The following topics are dealt with: edge and boundary analysis;vision systems;motion;shape and 2-D description;stereo and 3-D description;patternrecognition;3-D models;architectures;vision models and texture;image segmentation;applications and parallel algorithms;3-D analysis;contour analysis;character recognition;3-D descriptions from multiple views;and parallel architectures for image processing. 123 papers were presented, of which 121 are published in full in the present proceedings.
The proceedings contain 698 papers. The topics discussed include: learning unbiased classifiers from biased data with meta-learning;robustness against gradient based attacks through cost effective network fine-tuning;...
ISBN:
(纸本)9798350302493
The proceedings contain 698 papers. The topics discussed include: learning unbiased classifiers from biased data with meta-learning;robustness against gradient based attacks through cost effective network fine-tuning;gradient attention balance network: mitigating face recognition racial bias via gradient attention;estimating and maximizing mutual information for knowledge distillation;synthetic sample selection for generalized zero-shot learning;training strategies for vision transformers for object detection;does image anonymization impact computervision training?;ultra-sonic sensor based object detection for autonomous vehicles;improvements to image reconstruction-based performance prediction for semantic segmentation in highly automated driving;zero-shot classification at different levels of granularity;difficulty estimation with action scores for computervision tasks;detail-preserving self-supervised monocular depth with self-supervised structural sharpening;isolated sign language recognition based on tree structure skeleton images;deep prototypical-parts ease morphological kidney stone identification and are competitively robust to photometric perturbations;wildlife image generation from scene graphs;towards characterizing the semantic robustness of face recognition;high-level context representation for emotion recognition in images;and mitigating catastrophic interference using unsupervised multi-part attention for RGB-IR face recognition.
The proceedings contain 516 papers. The topics discussed include: OmniLayout: room layout reconstruction from indoor spherical panoramas;boosting adversarial robustness using feature level stochastic smoothing;beyond ...
ISBN:
(纸本)9781665448994
The proceedings contain 516 papers. The topics discussed include: OmniLayout: room layout reconstruction from indoor spherical panoramas;boosting adversarial robustness using feature level stochastic smoothing;beyond joint demosaicking and denoising: an image processing pipeline for a pixel-bin image sensor;assessment of deep learning based blood pressure prediction from PPG and rPPG signals;towards domain-specific explainable AI: model interpretation of a skin image classifier using a human approach;DAMSL: domain agnostic meta score-based learning;deep learning based spatial-temporal in-loop filtering for versatile video coding;automated tackle injury risk assessment in contact-based sports - a rugby union example;two-stage network for single image super-resolution;and ***: dataset for automatic mapping of buildings, woodlands, water and roads from aerial imagery.
暂无评论