Standard methods for video recognition use large CNNs designed to capture spatio-temporal data. However;training these models requires a large amount of labeled training data, containing a wide variety of actions, sce...
详细信息
ISBN:
(纸本)9781665445092
Standard methods for video recognition use large CNNs designed to capture spatio-temporal data. However;training these models requires a large amount of labeled training data, containing a wide variety of actions, scenes, settings and camera viewpoints. In this paper, we show that current convolutional neural network models are unable to recognize actions from camera viewpoints not present in their training data (i.e., unseen view action recognition). To address this, we develop approaches based on 3D representations and introduce a new geometric convolutional layer that can learn viewpoint invariant representations. Further, we introduce a new, challenging dataset for unseen view recognition and show the approaches ability to learn viewpoint invariant representations.
In recent years, emotional speech synthesis has shown considerable progress. However, some existing emotional speech synthesis methods only model emotion from a single scale, resulting in only global or average emotio...
详细信息
Recent work reports disparate performance for intersectional racial groups across face recognition tasks: face verification and identification. However, the definition of those racial groups has a significant impact o...
详细信息
ISBN:
(纸本)9781665409155
Recent work reports disparate performance for intersectional racial groups across face recognition tasks: face verification and identification. However, the definition of those racial groups has a significant impact on the underlying findings of such racial bias analysis. Previous studies define these groups based on either demographic information (e.g. African, Asian etc.) or skin tone (e.g. lighter or darker skins). The use of such sensitive or broad group definitions has disadvantages for bias investigation and subsequent counter-bias solutions design. By contrast, this study introduces an alternative racial bias analysis methodology via facial phenotype attributes for face recognition. We use the set of observable characteristics of an individual face where a race-related facial phenotype is hence specific to the human face and correlated to the racial profile of the subject. We propose categorical test cases to investigate the individual influence of those attributes on bias within face recognition tasks. We compare our phenotyp-ebased grouping methodology with previous grouping strategies and show that phenotype-based groupings uncover hidden bias without reliance upon any potentially protected attributes or ill-defined grouping strategies. Furthermore, we contribute corresponding phenotype attribute category labels for two face recognition tasks: RFW for face verification and VGGFace2 (test set) for face identification.
Human pose estimation(HPE), is the area of computervision estimating image or video recently received great attention on the spatial configuration of a human body and plays a vital role in several submissionsincludin...
详细信息
According to WHO's report from 2021, Drowning is the 3rd leading cause of unintentional death worldwide. The use of autonomous drones for drowning recognition can increase the survival rate and help lifeguards and...
详细信息
Implicit Neural Representations (INRs) are powerful to parameterize continous signals in computervision. However, almost all INRs methods are limited to low-level tasks, e.g., image/video compression, super-resolutio...
详细信息
Despite the significance of Sign Language education, access to resources with immediate feedback remains a challenge. This study aims to assess the effectiveness of an online learning tool offering real-time video fee...
详细信息
We present SMURF, a method for unsupervised learning of optical flow that improves state of the art on all benchmarks by 36% to 40% (over the prior best method UFlow) and even outperforms several supervised approaches...
详细信息
ISBN:
(纸本)9781665445092
We present SMURF, a method for unsupervised learning of optical flow that improves state of the art on all benchmarks by 36% to 40% (over the prior best method UFlow) and even outperforms several supervised approaches such as PWC-Net and FlowNet2. Our method integrates architecture improvements from supervised optical flow, i.e. the RAFT model, with new ideas for unsupervised learning that include a sequence-aware self-supervision loss, a technique for handling out-of-frame motion, and an approach for learning effectively from multi-frame video data while still only requiring two frames for inference.
Deep Neural Networks have shown remarkable performance in various applications of Visual patternrecognition (VPR). This field continues to grow due to emergence of new architectures, availability of huge data, and po...
详细信息
vision based human motion analysis encapsulates various tasks ranging from gesture recognition, pose detection, tracking, etc. to complex behavior analysis crucial in the realms of sports analytics, gait analysis in a...
详细信息
暂无评论