We describe an FPGA-based on-board control system for autonomous orientation of an aerial robot to assist aerial manipulation tasks. The system is able to apply yaw control to aid an operator to precisely position a d...
详细信息
ISBN:
(纸本)9781509014378
We describe an FPGA-based on-board control system for autonomous orientation of an aerial robot to assist aerial manipulation tasks. The system is able to apply yaw control to aid an operator to precisely position a drone when it is nearby a bar-like object. This is achieved by applying parallel Hough transform enhanced with a novel image space separation method, enabling highly reliable results in various circumstances combined with high performance. The feasibility of this approach is shown by applying the system to a multi-rotor aerial robot equipped with an upward directed robotic hand on top of the airframe developed for high altitude manipulation tasks. In order to grasp a bar-like object, orientation of the bar object is observed from the image data obtained by a monocular camera mounted on the robot. This data is then analyzed by the on-board FPGA system to control yaw angle of the aerial robot. In experiments, reliable yaw-orientation control of the aerial robot is achieved.
Person-independent and pose-invariant estimation of eye-gaze is important for situation analysis and for automated video annotation. We propose a fast cascade regression based method that first estimates the location ...
详细信息
ISBN:
(纸本)9781509014378
Person-independent and pose-invariant estimation of eye-gaze is important for situation analysis and for automated video annotation. We propose a fast cascade regression based method that first estimates the location of a dense set of markers and their visibility, then reconstructs face shape by fitting a part-based 3D model. Next, the reconstructed 3D shape is used to estimate a canonical view of the eyes for 3D gaze estimation. The model operates in a feature space that naturally encodes local ordinal properties of pixel intensities leading to photometric invariant estimation of gaze. To evaluate the algorithm in comparison with alternative approaches, three publicly-available databases were used, Boston University Head Tracking, Multi-View Gaze and CAVE Gaze datasets. Precision for head pose and gaze averaged 4 degrees or less for pitch, yaw, and roll. The algorithm outperformed alternative methods in both datasets.
We propose a novel approach to template based face recognition. Our dual goal is to both increase recognition accuracy and reduce the computational and storage costs of template matching. To do this, we leverage on an...
详细信息
ISBN:
(纸本)9781509014378
We propose a novel approach to template based face recognition. Our dual goal is to both increase recognition accuracy and reduce the computational and storage costs of template matching. To do this, we leverage on an approach which was proven effective in many other domains, but, to our knowledge, never fully explored for face images: average pooling of face photos. We show how (and why!) the space of a template's images can be partitioned and then pooled based on image quality and head pose and the effect this has on accuracy and template size. We perform extensive tests on the IJB-A and Janus CS2 template based face identification and verification benchmarks. These show that not only does our approach outperform published state of the art despite requiring far fewer cross template comparisons, but also, surprisingly, that image pooling performs on par with deep feature pooling.
In this paper, we present two large video multi-modal datasets for RGB and RGB-D gesture recognition: the ChaLearn LAP RGB-D Isolated Gesture Dataset (IsoGD) and the Continuous Gesture Dataset (ConGD). Both datasets a...
详细信息
ISBN:
(纸本)9781509014378
In this paper, we present two large video multi-modal datasets for RGB and RGB-D gesture recognition: the ChaLearn LAP RGB-D Isolated Gesture Dataset (IsoGD) and the Continuous Gesture Dataset (ConGD). Both datasets are derived from the ChaLearn Gesture Dataset (CGD) that has a total of more than 50000 gestures for the "one-shot-learning" competition. To increase the potential of the old dataset, we designed new well curated datasets composed of 249 gesture labels, and including 47933 gestures manually labeled the begin and end frames in sequences. Using these datasets we will open two competitions on the CodaLab platform so that researchers can test and compare their methods for "user independent" gesture recognition. The first challenge is designed for gesture spotting and recognition in continuous sequences of gestures while the second one is designed for gesture classification from segmented data. The baseline method based on the bag of visual words model is also presented.
Matching with hidden information which is available only during training and not during testing has recently become an important research problem. Matching data from two different modalities, known as cross-modal matc...
详细信息
ISBN:
(纸本)9781509014378
Matching with hidden information which is available only during training and not during testing has recently become an important research problem. Matching data from two different modalities, known as cross-modal matching is another challenging problem due to the large variations in the data coming from different modalities. Often, these are treated as two independent problems. But for applications like matching RGBD data, when only one modality is available during testing, it can reduce to either of the two problems. In this work, we propose a framework which can handle both these scenarios seamlessly with applications to matching RGBD data of Lambertian objects. The proposed approach jointly uses the RGB and depth data to learn an illumination invariant canonical version of the objects. Dictionaries are learnt for the RGB, depth and the canonical data, such that the transformed sparse coefficients of the RGB and the depth data is equal to that of the canonical data. Given RGB or depth data, their sparse coefficients corresponding to their canonical version is computed which can be directly used for matching using a Mahalanobis metric. Extensive experiments on three datasets, EURECOM, VAP RGB-D-T and Texas 3D Face recognition database show the effectiveness of the proposed framework.
Object proposal has been successfully applied in recent visual object detection approaches and shown improved computational efficiency. The purpose of object proposal is to use as few as regions to cover as many as ob...
详细信息
ISBN:
(纸本)9781509014378
Object proposal has been successfully applied in recent visual object detection approaches and shown improved computational efficiency. The purpose of object proposal is to use as few as regions to cover as many as objects. In this paper, we propose a strategy named Texture Complexity based Redundant Regions Ranking (TCR) for object proposal. Our approach first produces rich but redundant regions using a color segmentation approach, i.e. Selective Search. It then uses Texture Complexity (TC) based on complete contour number and Local Binary pattern (LBP) entropy to measure the objectness score of each region. By ranking based on the TC, it is expected that as many as true object regions are preserved, while the number of the regions is significantly reduced. Experimental results on the PASCAL VOC 2007 dataset show that the proposed TCR significantly improves the baseline approach by increasing AUC (area under recall curve) from 0.39 to 0.48. It also outperforms the state-of-the-art with AUC and uses fewer detection proposals to achieve comparable recall rates.
Recognizing facial expression in a wild setting has remained a challenging task in computervision. The World Wide Web is a good source of facial images which most of them are captured in uncontrolled conditions. In f...
详细信息
ISBN:
(纸本)9781509014378
Recognizing facial expression in a wild setting has remained a challenging task in computervision. The World Wide Web is a good source of facial images which most of them are captured in uncontrolled conditions. In fact, the Internet is a Word Wild Web of facial images with expressions. This paper presents the results of a new study on collecting, annotating, and analyzing wild facial expressions from the web. Three search engines were queried using 1250 emotion related keywords in six different languages and the retrieved images were mapped by two annotators to six basic expressions and neutral. Deep neural networks and noise modeling were used in three different training scenarios to find how accurately facial expressions can be recognized when trained on noisy images collected from the web using query terms (e.g. happy face, laughing man, etc)? The results of our experiments show that deep neural networks can recognize wild facial expressions with an accuracy of 82.12%.
We propose a Gaussian Conditional Random Field (GCRF) approach to modeling the non-stationary distortions that are introduced from changing facial expressions during acquisition. While previous work employed a Gaussia...
详细信息
ISBN:
(纸本)9781509014378
We propose a Gaussian Conditional Random Field (GCRF) approach to modeling the non-stationary distortions that are introduced from changing facial expressions during acquisition. While previous work employed a Gaussian Markov Random Field (GMRF) to perform deformation tolerant matching of periocular images, we show that the approach is not well-suited for facial images, which can contain significantly larger and more complex deformations across the image. Like the GMRF, the GCRF tries to find the maximum scoring assignment between a match pair in the presence of non-stationary deformations. However, unlike the GMRF, the GCRF directly computes the posterior probability that the observed deformation is consistent with the distortions exhibited in other authentic match pairs. The difference is the inclusion of a derived mapping between an input comparison and output deformation score. We evaluate performance on the CMU Multi-PIE facial dataset across all sessions and expressions, finding that the GCRF is significantly more effective at capturing naturally occurring large deformations than the previous GMRF approach.
Deep neural networks usually benefit from unsupervised pre-training, e.g. auto-encoders. However, the classifier further needs supervised fine-tuning methods for good discrimination. Besides, due to the limits of full...
详细信息
ISBN:
(纸本)9781509014378
Deep neural networks usually benefit from unsupervised pre-training, e.g. auto-encoders. However, the classifier further needs supervised fine-tuning methods for good discrimination. Besides, due to the limits of full-connection, the application of auto-encoders is usually limited to small, well aligned images. In this paper, we incorporate the supervised information to propose a novel formulation, namely class-encoder, whose training objective is to reconstruct a sample from another one of which the labels are identical. Class-encoder aims to minimize the intra-class variations in the feature space, and to learn a good discriminative manifolds on a class scale. We impose the class-encoder as a constraint into the softmax for better supervised training, and extend the reconstruction on feature-level to tackle the parameter size issue and translation issue. The experiments show that the class-encoder helps to improve the performance on benchmarks of classification and face recognition. This could also be a promising direction for fast training of face recognition models.
Weakly supervised methods have recently become one of the most popular machine learning methods since they are able to be used on large-scale datasets without the critical requirement of richly annotated data. In this...
详细信息
ISBN:
(纸本)9781509014378
Weakly supervised methods have recently become one of the most popular machine learning methods since they are able to be used on large-scale datasets without the critical requirement of richly annotated data. In this paper, we present a novel, self-taught, discriminative facial feature analysis approach in the weakly supervised framework. Our method can find regions which are discriminative across classes yet consistent within a class and can solve many face related problems. The proposed method first trains a deep face model with high discriminative capability to extract facial features. The hypercolumn features are then used to give pixel level representation for better classification performance along with discriminative region detection. In addition, calibration approaches are proposed to enable the system to deal with multi-class and mixed-class problems. The system is also able to detect multiple discriminative regions from one image. Our uniform method is able to achieve competitive results in various face analysis applications, such as occlusion detection, face recognition, gender classification, twins verification and facial attractiveness analysis.
暂无评论