We propose a novel approach to template based face recognition. Our dual goal is to both increase recognition accuracy and reduce the computational and storage costs of template matching. To do this, we leverage on an...
详细信息
ISBN:
(纸本)9781509014378
We propose a novel approach to template based face recognition. Our dual goal is to both increase recognition accuracy and reduce the computational and storage costs of template matching. To do this, we leverage on an approach which was proven effective in many other domains, but, to our knowledge, never fully explored for face images: average pooling of face photos. We show how (and why!) the space of a template's images can be partitioned and then pooled based on image quality and head pose and the effect this has on accuracy and template size. We perform extensive tests on the IJB-A and Janus CS2 template based face identification and verification benchmarks. these show that not only does our approach outperform published state of the art despite requiring far fewer cross template comparisons, but also, surprisingly, that image pooling performs on par with deep feature pooling.
In several domains, including healthcare and home automation, it is important to unobtrusively monitor the activities of daily living (ADLs) executed by people at home. A popular approach consists in the use of sensor...
详细信息
In several domains, including healthcare and home automation, it is important to unobtrusively monitor the activities of daily living (ADLs) executed by people at home. A popular approach consists in the use of sensors attached to everyday objects to capture user interaction, and ADL models to recognize the current activity based on the temporal sequence of used objects. However, both knowledge-based and data-driven approaches to object-based ADL recognition have different issues that limit their applicability in real-world deployments. Hence, in this paper, we pursue an alternative approach, which consists in mining ADL models from the Web. Existing attempts in this sense are mainly based on Web page mining and lexical analysis. One issue withthose attempts relies on the high level of noise found in the textual content of Web pages. In order to overcome that issue, our intuition is that pictures illustrating the execution of a given activity offer much more compact and expressive information than the textual content of a Web page regarding the same activity. Hence, we present a novel method to couple Web mining and computervision for automatically extracting ADL models from visual items. Our method relies on Web image search engines to select the most relevant pictures for each considered activity. We use off-the-shelf computervision APIs and a lexical database to extract the key objects appearing in those pictures. We introduce a probabilistic technique to measure the relevance among activities and objects. through experiments with a large dataset of real-world ADLs, we show that our method significantly improves the existing approach.
Person-independent and pose-invariant estimation of eye-gaze is important for situation analysis and for automated video annotation. We propose a fast cascade regression based method that first estimates the location ...
详细信息
ISBN:
(纸本)9781509014378
Person-independent and pose-invariant estimation of eye-gaze is important for situation analysis and for automated video annotation. We propose a fast cascade regression based method that first estimates the location of a dense set of markers and their visibility, then reconstructs face shape by fitting a part-based 3D model. Next, the reconstructed 3D shape is used to estimate a canonical view of the eyes for 3D gaze estimation. the model operates in a feature space that naturally encodes local ordinal properties of pixel intensities leading to photometric invariant estimation of gaze. To evaluate the algorithm in comparison with alternative approaches, three publicly-available databases were used, Boston University Head Tracking, Multi-View Gaze and CAVE Gaze datasets. Precision for head pose and gaze averaged 4 degrees or less for pitch, yaw, and roll. the algorithm outperformed alternative methods in both datasets.
Feature learning with deep models has achieved impressive results for both data representation and classification for various vision tasks. Deep feature learning, however, typically requires a large amount of training...
详细信息
ISBN:
(纸本)9781467388511
Feature learning with deep models has achieved impressive results for both data representation and classification for various vision tasks. Deep feature learning, however, typically requires a large amount of training data, which may not be feasible for some application domains. Transfer learning can be one of the approaches to alleviate this problem by transferring data from data-rich source domain to data-scarce target domain. Existing transfer learning methods typically perform one-shot transfer learning and often ignore the specific properties that the transferred data must satisfy. To address these issues, we introduce a constrained deep transfer feature learning method to perform simultaneous transfer learning and feature learning by performing transfer learning in a progressively improving feature space iteratively in order to better narrow the gap between the target domain and the source domain for effective transfer of the data from source domain to target domain. Furthermore, we propose to exploit the target domain knowledge and incorporate such prior knowledge as constraint during transfer learning to ensure that the transferred data satisfies certain properties of the target domain. To demonstrate the effectiveness of the proposed constrained deep transfer feature learning method, we apply it to thermal feature learning for eye detection by transferring from the visible domain. We also applied the proposed method for cross-view facial expression recognition as a second application. the experimental results demonstrate the effectiveness of the proposed method for both applications.
In this paper, we present two large video multi-modal datasets for RGB and RGB-D gesture recognition: the ChaLearn LAP RGB-D Isolated Gesture Dataset (IsoGD) and the Continuous Gesture Dataset (ConGD). Both datasets a...
详细信息
ISBN:
(纸本)9781509014378
In this paper, we present two large video multi-modal datasets for RGB and RGB-D gesture recognition: the ChaLearn LAP RGB-D Isolated Gesture Dataset (IsoGD) and the Continuous Gesture Dataset (ConGD). Both datasets are derived from the ChaLearn Gesture Dataset (CGD) that has a total of more than 50000 gestures for the "one-shot-learning" competition. To increase the potential of the old dataset, we designed new well curated datasets composed of 249 gesture labels, and including 47933 gestures manually labeled the begin and end frames in sequences. Using these datasets we will open two competitions on the CodaLab platform so that researchers can test and compare their methods for "user independent" gesture recognition. the first challenge is designed for gesture spotting and recognition in continuous sequences of gestures while the second one is designed for gesture classification from segmented data. the baseline method based on the bag of visual words model is also presented.
Matching with hidden information which is available only during training and not during testing has recently become an important research problem. Matching data from two different modalities, known as cross-modal matc...
详细信息
ISBN:
(纸本)9781509014378
Matching with hidden information which is available only during training and not during testing has recently become an important research problem. Matching data from two different modalities, known as cross-modal matching is another challenging problem due to the large variations in the data coming from different modalities. Often, these are treated as two independent problems. But for applications like matching RGBD data, when only one modality is available during testing, it can reduce to either of the two problems. In this work, we propose a framework which can handle boththese scenarios seamlessly with applications to matching RGBD data of Lambertian objects. the proposed approach jointly uses the RGB and depth data to learn an illumination invariant canonical version of the objects. Dictionaries are learnt for the RGB, depth and the canonical data, such that the transformed sparse coefficients of the RGB and the depth data is equal to that of the canonical data. Given RGB or depth data, their sparse coefficients corresponding to their canonical version is computed which can be directly used for matching using a Mahalanobis metric. Extensive experiments on three datasets, EURECOM, VAP RGB-D-T and Texas 3D Face recognition database show the effectiveness of the proposed framework.
Object proposal has been successfully applied in recent visual object detection approaches and shown improved computational efficiency. the purpose of object proposal is to use as few as regions to cover as many as ob...
详细信息
ISBN:
(纸本)9781509014378
Object proposal has been successfully applied in recent visual object detection approaches and shown improved computational efficiency. the purpose of object proposal is to use as few as regions to cover as many as objects. In this paper, we propose a strategy named Texture Complexity based Redundant Regions Ranking (TCR) for object proposal. Our approach first produces rich but redundant regions using a color segmentation approach, i.e. Selective Search. It then uses Texture Complexity (TC) based on complete contour number and Local Binary pattern (LBP) entropy to measure the objectness score of each region. By ranking based on the TC, it is expected that as many as true object regions are preserved, while the number of the regions is significantly reduced. Experimental results on the PASCAL VOC 2007 dataset show that the proposed TCR significantly improves the baseline approach by increasing AUC (area under recall curve) from 0.39 to 0.48. It also outperforms the state-of-the-art with AUC and uses fewer detection proposals to achieve comparable recall rates.
We propose a Gaussian Conditional Random Field (GCRF) approach to modeling the non-stationary distortions that are introduced from changing facial expressions during acquisition. While previous work employed a Gaussia...
详细信息
ISBN:
(纸本)9781509014378
We propose a Gaussian Conditional Random Field (GCRF) approach to modeling the non-stationary distortions that are introduced from changing facial expressions during acquisition. While previous work employed a Gaussian Markov Random Field (GMRF) to perform deformation tolerant matching of periocular images, we show that the approach is not well-suited for facial images, which can contain significantly larger and more complex deformations across the image. Like the GMRF, the GCRF tries to find the maximum scoring assignment between a match pair in the presence of non-stationary deformations. However, unlike the GMRF, the GCRF directly computes the posterior probability that the observed deformation is consistent withthe distortions exhibited in other authentic match pairs. the difference is the inclusion of a derived mapping between an input comparison and output deformation score. We evaluate performance on the CMU Multi-PIE facial dataset across all sessions and expressions, finding that the GCRF is significantly more effective at capturing naturally occurring large deformations than the previous GMRF approach.
Recognizing facial expression in a wild setting has remained a challenging task in computervision. the World Wide Web is a good source of facial images which most of them are captured in uncontrolled conditions. In f...
详细信息
ISBN:
(纸本)9781509014378
Recognizing facial expression in a wild setting has remained a challenging task in computervision. the World Wide Web is a good source of facial images which most of them are captured in uncontrolled conditions. In fact, the Internet is a Word Wild Web of facial images with expressions. this paper presents the results of a new study on collecting, annotating, and analyzing wild facial expressions from the web. three search engines were queried using 1250 emotion related keywords in six different languages and the retrieved images were mapped by two annotators to six basic expressions and neutral. Deep neural networks and noise modeling were used in three different training scenarios to find how accurately facial expressions can be recognized when trained on noisy images collected from the web using query terms (e.g. happy face, laughing man, etc)? the results of our experiments show that deep neural networks can recognize wild facial expressions with an accuracy of 82.12%.
Distributed algorithms have recently gained immense popularity. With regards to computervision applications, distributed multi-target tracking in a camera network is a fundamental problem. the goal is for all cameras...
详细信息
Distributed algorithms have recently gained immense popularity. With regards to computervision applications, distributed multi-target tracking in a camera network is a fundamental problem. the goal is for all cameras to have accurate state estimates for all targets. Distributed estimation algorithms work by exchanging information between sensors that are communication neighbors. vision-based distributed multi-target state estimation has at least two characteristics that distinguishes it from other applications. First, cameras are directional sensors and often neighboring sensors may not be sensing the same targets, i.e., they are naive with respect to that target. Second, in the presence of clutter and multiple targets, each camera must solve a data association problem. this paper presents an information-weighted, consensus-based, distributed multi-target tracking algorithm referred to as the Multi-target Information Consensus (MTIC) algorithm that is designed to address boththe naivety and the data association problems. It converges to the centralized minimum mean square error estimate. the proposed MTIC algorithm and its extensions to non-linear camera models, termed as the Extended MTIC (EMTIC), are robust to false measurements and limited resources like power, bandwidth and the realtime operational requirements. Simulation and experimental analysis are provided to support the theoretical results.
暂无评论