Emotion recognition is a key component of human-computer interaction in social robotics. In this paper, we present Empath-Obscura, an innovative ensemble model designed to detect emotions in obfuscated faces. The mode...
详细信息
ISBN:
(纸本)9798350395747
Emotion recognition is a key component of human-computer interaction in social robotics. In this paper, we present Empath-Obscura, an innovative ensemble model designed to detect emotions in obfuscated faces. The model combines the cutting-edge object detection models YOLO V5 and V8 with the well-established Poster++ facial emotion recognition model. A significant contribution of this work is the development of a novel data augmentation technique that utilizes SPIGA, a shape-preserving facial landmark detection model, to selectively obscure facial features. This approach enhances the model's robustness against partially hidden facial expressions, improving the performance of the overall model by 13.18%. Empath-Obscura is rigorously validated on the FER-2013 dataset, which is well-suited for this study due to its representation of low-resolution and poor-quality facial images. A manually obfuscated and annotated test set further ensures accurate evaluation. The ensemble model achieved a remarkable accuracy of 69.3%, outperforming the individual models. The results presented in this paper, along with the innovation in our ensemble and data augmentation techniques, offer a significant contribution to the fields of social robotics and emotion recognition. This work provides researchers and practitioners with a robust and reliable tool for emotion detection from obfuscated faces, contributing to advancements in human-computer interaction for social robotics.
Automatic methods for detecting presentation attacks are essential to ensure the reliable use of facial recognition technology. Most of the methods available in the literature for presentation attack detection (PAD) f...
详细信息
ISBN:
(纸本)9781665445092
Automatic methods for detecting presentation attacks are essential to ensure the reliable use of facial recognition technology. Most of the methods available in the literature for presentation attack detection (PAD) fails in generalizing to unseen attacks. In recent years, multi-channel methods have been proposed to improve the robustness of PAD systems. Often, only a limited amount of data is available for additional channels, which limits the effectiveness of these methods. In this work, we present a new framework for PAD that uses RGB and depth channels together with a novel loss function. The new architecture uses complementary information from the two modalities while reducing the impact of ovetfitting. Essentially, a cross-modal focal loss function is proposed to modulate the loss contribution of each channel as a function of the confidence of individual channels. Extensive evaluations in two publicly available datasets demonstrate the effectiveness of the proposed approach.
Home security is a crucial aspect that requires careful attention, particularly when it comes to addressing theft concerns. Hence, implementing smart door technology equipped with facial recognition holds promising po...
详细信息
This work addresses the challenge of data scarcity in personality-labeled datasets by introducing personality labels to clips from two open datasets, ZeroEGGS and Bandai, which provide diverse fullbody animations. To ...
详细信息
ISBN:
(数字)9798350374490
ISBN:
(纸本)9798350374490;9798350374506
This work addresses the challenge of data scarcity in personality-labeled datasets by introducing personality labels to clips from two open datasets, ZeroEGGS and Bandai, which provide diverse fullbody animations. To this end, we present a user study to annotate short clips from both sets with labels based on the Five-Factor Model (FFM) of personality. We chose features informed by Laban Movement Analysis (LMA) to represent each animation. These features then guided us to select the samples of distinct motion styles to be included in the user study, obtaining high personality variance and keeping the study duration and cost viable. Using the labeled data, we then ran a correlation analysis to find features that indicate high correlation with each personality dimension. Our regression analysis results indicate that highly correlated features are promising in accurate personality estimation. We share our early findings, code, and data publicly.
Fairness is becoming an increasingly crucial issue for computervision, especially in the human-related decision systems. However, achieving algorithmic fairness, which makes a model produce indiscriminative outcomes ...
详细信息
ISBN:
(纸本)9781665445092
Fairness is becoming an increasingly crucial issue for computervision, especially in the human-related decision systems. However, achieving algorithmic fairness, which makes a model produce indiscriminative outcomes against protected groups, is still an unresolved problem. In this paper, we devise a systematic approach which reduces algorithmic biases via feature distillation for visual recognition tasks, dubbed as MMD-based Fair Distillation (MFD). While the distillation technique has been widely used in general to improve the prediction accuracy, to the best of our knowledge, there has been no explicit work that also tries to improve fairness via distillation. Furthermore, We give a theoretical justification of our MFD on the effect of knowledge distillation and fairness. Throughout the extensive experiments, we show our MFD significantly mitigates the bias against specific minorities without any loss of the accuracy on both synthetic and real-world face datasets.
In this paper, we present a direct adaptation strategy (ADAS), which aims to directly adapt a single model to multiple target domains in a semantic segmentation task without pretrained domain-specific models. To do so...
详细信息
As artificial intelligent models continue to grow in their capacity and sophistication, they are often trusted with very sensitive information. In the sub-field of adversarial machine learning, developments are geared...
详细信息
ISBN:
(纸本)9781665477291
As artificial intelligent models continue to grow in their capacity and sophistication, they are often trusted with very sensitive information. In the sub-field of adversarial machine learning, developments are geared solely towards finding reliable methods to systematically erode the ability of artificial intelligent systems to perform as intended. These techniques can cause serious breaches of security, interruptions to major systems, and irreversible damage to consumers. Our research evaluates the effects of various white box adversarial machine learning attacks on popular computervision deep learning models leveraging a public X-ray dataset from the National Institutes of Health (NIH). We make use of several experiments to gauge the feasibility of developing deep learning models that are robust to adversarial machine learning attacks by taking into account different defense strategies, such as adversarial training, to observe how adversarial attacks evolve over time. Our research details how a variety white box attacks effect different components of InceptionNet, DenseNet, and ResNeXt and suggest how the models can effectively defend against these attacks.
We study the problem of directly optimizing arbitrary non-differentiable task evaluation metrics such as misclassification rate and recall. Our method, named MetricOpt, operates in a black-box setting where the comput...
详细信息
ISBN:
(纸本)9781665445092
We study the problem of directly optimizing arbitrary non-differentiable task evaluation metrics such as misclassification rate and recall. Our method, named MetricOpt, operates in a black-box setting where the computational details of the target metric are unknown. We achieve this by learning a differentiable value function, which maps compact task-specific model parameters to metric observations. The learned value function is easily pluggable into existing optimizers like SGD and Adam, and is effective for rapidly finetuning a pre-trained model. This leads to consistent improvements since the value function provides effective metric supervision during finetuning, and helps to correct the potential bias of loss-only supervision. MetricOpt achieves state-of-the-art performance on a variety of metrics for (image) classification, image retrieval and object detection. Solid benefits are found over competing methods, which often involve complex loss design or adaptation. MetricOpt also generalizes well to new tasks and model architectures.
This paper strives for repetitive activity counting in videos. Different from existing works, which all analyze the visual video content only, we incorporate for the first time the corresponding sound into the repetit...
详细信息
ISBN:
(纸本)9781665445092
This paper strives for repetitive activity counting in videos. Different from existing works, which all analyze the visual video content only, we incorporate for the first time the corresponding sound into the repetition counting process. This benefits accuracy in challenging vision conditions such as occlusion, dramatic camera view changes, low resolution, etc. We propose a model that starts with analyzing the sight and sound streams separately. Then an audiovisual temporal stride decision module and a reliability estimation module are introduced to exploit cross-modal temporal interaction. For learning and evaluation, an existing dataset is repurposed and reorganized to allow for repetition counting with sight and sound. We also introduce a variant of this dataset for repetition counting under challenging vision conditions. Experiments demonstrate the benefit of sound, as well as the other introduced modules, for repetition counting. Our sight-only model already outperforms the state-of-the-art by itself, when we add sound, results improve notably, especially under harsh vision conditions. The code and datasets are available at https://***/xiaobai1217/RepetitionCounting.
A crucial component for the scene text based reasoning required for TextVQA and TextCaps datasets involve detecting and recognizing text present in the images using an optical character recognition (OCR) system. The c...
详细信息
ISBN:
(纸本)9781665445092
A crucial component for the scene text based reasoning required for TextVQA and TextCaps datasets involve detecting and recognizing text present in the images using an optical character recognition (OCR) system. The current systems are crippled by the unavailability of ground truth text annotations for these datasets as well as lack of scene text detection and recognition datasets on real images disallowing the progress in the field of OCR and evaluation of scene text based reasoning in isolation from OCR systems. In this work, we propose TextOCR, an arbitrary-shaped scene text detection and recognition with 900k annotated words collected on real images from TextVQA dataset. We show that current state-of-the-art text-recognition (OCR) models fail to perform well on TextOCR and that training on TextOCR helps achieve state-of-the-art performance on multiple other OCR datasets as well. We use a TextOCR trained OCR model to create PixelM4C model which can do scene text based reasoning on an image in an end-to-end fashion, allowing us to revisit several design choices to achieve new state-of-the-art performance on TextVQA dataset.
暂无评论