This paper presents a mobile application for real time facial expression recognition running on a smart phone with a camera. The proposed system uses a set of Support Vector Machines (SVMs) for classifying 6 basic emo...
详细信息
ISBN:
(纸本)9781479943098
This paper presents a mobile application for real time facial expression recognition running on a smart phone with a camera. The proposed system uses a set of Support Vector Machines (SVMs) for classifying 6 basic emotions and neutral expression along with checking mouth status. The facial expression features for emotion recognition are extracted by Active Shape Model (ASM) fitting landmarks on a face and then dynamic features are generated by the displacement between neutral and expression features. We show experimental results with 86% of accuracy with 10 folds cross validation in 309 video samples of the extended Cohn-Kanade (CK+) dataset. Using the same SVM models, the mobile app is running on Samsung Galaxy S3 with 2.4 fps. The accuracy of real-time mobile emotion recognition is about 72% for 6 posed basic emotions and neutral expression by 7 subjects who are not professional actors.
Continual learning (CL) has become one of the most active research venues within the artificial intelligence community in recent years. Given the significant amount of attention paid to continual learning, the need fo...
详细信息
ISBN:
(纸本)9781665448994
Continual learning (CL) has become one of the most active research venues within the artificial intelligence community in recent years. Given the significant amount of attention paid to continual learning, the need for a library that facilitates both research and development in this field is more visible than ever. However, CL algorithms' codes are currently scattered over isolated repositories written with different frameworks, making it difficult for researchers and practitioners to work with various CL algorithms and benchmarks using the same interface. In this paper, we introduce CL-Gym, a full-featured continual learning library that overcomes this challenge and accelerates the research and development. In addition to the necessary infrastructure for running end-to-end continual learning experiments, CL-Gym includes benchmarks for various CL scenarios and several state-of-the-art CL algorithms. In this paper, we present the architecture, design philosophies, and technical details behind CL-Gym (1).
In this paper, we introduce our hybrid image and video compression scheme enhanced by CNN-optimized in-loop filter. Specifically, a Structure Preserving in-Loop Filter (SPiLF) is incorporated in the hybrid video codec...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
In this paper, we introduce our hybrid image and video compression scheme enhanced by CNN-optimized in-loop filter. Specifically, a Structure Preserving in-Loop Filter (SPiLF) is incorporated in the hybrid video codec Enhanced Compression Model (ECM), where two branches, i.e., gradient branch and pixel branch, are developed based on the dense residual unit (DRU). To provide pleasant visual quality, the Generative adversarial networks (GAN) loss and LPIPS loss are further considered. Therefore, the proposal is mainly focusing on perceptual-friendly image compression for human vision, whilst video compression could be further investigated. The experiments show that the proposed method achieves advanced visual quality when compared to the traditional methods.
The number of digital images that needs to be acquired, analyzed, classified, stored and retrieved in the medical centers is exponentially growing with the advances in medical imaging technologic Accordingly medical i...
详细信息
ISBN:
(纸本)9781424439942
The number of digital images that needs to be acquired, analyzed, classified, stored and retrieved in the medical centers is exponentially growing with the advances in medical imaging technologic Accordingly medical image classification and retrieval has become a popular topic in the recent years. Despite many projects,focusing on this problem, proposed solutions are still far from being sufficiently accurate for real-life implementations. Interpreting medical image classification and retrieval as a multi-class classification task, in this work, we investigate the performance of five different feature types in a SVM-based learning framework-for classification of human body X-Ray images into classes corresponding to body parts. Our comprehensive experiments,show that four conventional feature types provide performances comparable to the literature with low per-class accuracies, whereas local binary patterns produce not only very good global accuracy but also good class-specific accuracies with respect to the features used in the literature.
An algorithm is proposed for the 3D modeling of static scenes solely based on the range and intensity data acquired by a Time-of-Flight camera during an arbitrary movement. No additional scene acquisition devices, lik...
详细信息
ISBN:
(纸本)9781424439942
An algorithm is proposed for the 3D modeling of static scenes solely based on the range and intensity data acquired by a Time-of-Flight camera during an arbitrary movement. No additional scene acquisition devices, like inertia sensor, positioning robots or intensity based cameras are incorporated. The current pose is estimated by maximizing the uncentered correlation coefficient between edges detected in the current and a preceding frame at a minimum frame rate of four fps and an average accuracy of 45 mm. The paper also describes several extensions for robust registration like multiresolution hierarchies and projection Iterative Closest Point algorithm. The basic registration algorithm and its extensions were intensively evaluated against ground truth data to validate the accuracy, robustness and real-time-capability.
We show how to outsource data annotation to Amazon Mechanical Turk. Doing so has produced annotations in quite large numbers relatively cheaply. The quality is good, and can be checked and controlled. Annotations are ...
详细信息
ISBN:
(纸本)9781424423392
We show how to outsource data annotation to Amazon Mechanical Turk. Doing so has produced annotations in quite large numbers relatively cheaply. The quality is good, and can be checked and controlled. Annotations are produced quickly. We describe results for several different annotation problems. We describe some strategies for determining when the task is well specified and properly priced.
Human pose estimation is a well-known problem in computervision to locate joint positions. Existing datasets for learning of poses are observed to be not challenging enough in terms of pose diversity, object occlusio...
详细信息
ISBN:
(纸本)9781728193601
Human pose estimation is a well-known problem in computervision to locate joint positions. Existing datasets for learning of poses are observed to be not challenging enough in terms of pose diversity, object occlusion and view points. This makes the pose annotation process relatively simple and restricts the application of the models that have been trained on them. To handle more variety in human poses, we propose the concept of fine-grained hierarchical pose classification, in which we formulate the pose estimation as a classification task, and propose a dataset, Yoga-82, for large-scale yoga pose recognition with 82 classes. Yoga-82 consists of complex poses where fine annotations may not be possible. To resolve this, we provide hierarchical labels for yoga poses based on the body configuration of the pose. The dataset contains a three-level hierarchy including body positions, variations in body positions, and the actual pose names. We present the classification accuracy of the state-of-the-art convolutional neural network architectures on Yoga-82. We also present several hierarchical variants of DenseNet in order to utilize the hierarchical labels.
A false negative in object detection describes an object that was not correctly localised and classified by a detector. In prior work, we introduced five 'false negative mechanisms' that identify the specific ...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
A false negative in object detection describes an object that was not correctly localised and classified by a detector. In prior work, we introduced five 'false negative mechanisms' that identify the specific component inside the detector architecture that failed to detect the object. Using these mechanisms, we explore how different computervision datasets and their inherent characteristics can influence object detector failures. Specifically, we investigate the false negative mechanisms of Faster R-CNN and RetinaNet across five computervision datasets, namely Microsoft COCO, Pascal VOC, ExDark, ObjectNet, and COD10K. Our results show that object size and class influence the false negative mechanisms of object detectors. We also show that comparing the false negative mechanisms of a single object class across different datasets can highlight potentially unknown biases in datasets.
Temporal action localization for untrimmed videos is a difficult problem in computervision. It is challenge to infer the start and end of activity instances on small-scale datasets covering multi-view information acc...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Temporal action localization for untrimmed videos is a difficult problem in computervision. It is challenge to infer the start and end of activity instances on small-scale datasets covering multi-view information accurately. In this paper, we propose an effective activity temporal localization and classification method to localize the temporal boundaries and predict the class label of activities for naturalistic driving. Our approach includes (i) a distraction behavior recognition and localization method in naturalistic driving videos on small-scale data sets, (ii) a strategy that uses multi-branch network to make full use of information from different channels, (iii)a post-processing method for selecting and correcting temporal range to ensure that our system finds accurate boundaries. In addition, the frame-level object detection information is also utilized. Extensive experiments prove the effectiveness of our method and we rank the 6th on the Test-A2 of the 6th AI City Challenge track 3.
In this paper we present a novel approach to detect groups in ego-vision scenarios. People in the scene are tracked through the video sequence and their head pose and 3D location are estimated. Based on the concept of...
详细信息
ISBN:
(纸本)9781479943098
In this paper we present a novel approach to detect groups in ego-vision scenarios. People in the scene are tracked through the video sequence and their head pose and 3D location are estimated. Based on the concept of f-formation, we define with the orientation and distance an inherently social pairwise feature that describes the affinity of a pair of people in the scene. We apply a correlation clustering algorithm that merges pairs of people into socially related groups. Due to the very shifting nature of social interactions and the different meanings that orientations and distances can assume in different contexts, we learn the weight vector of the correlation clustering using Structural SVMs. We extensively test our approach on two publicly available datasets showing encouraging results when detecting groups from first-person camera views.
暂无评论