Deep learning based approaches have been dominating the face recognition field due to the significant performance improvement they have provided on the challenging wild datasets. These approaches have been extensively...
详细信息
ISBN:
(纸本)9781509014378
Deep learning based approaches have been dominating the face recognition field due to the significant performance improvement they have provided on the challenging wild datasets. These approaches have been extensively tested on such unconstrained datasets, on the Labeled Faces in the Wild and YouTube Faces, to name a few. However, their capability to handle individual appearance variations caused by factors such as head pose, illumination, occlusion, and misalignment has not been thoroughly assessed till now. In this paper, we present a comprehensive study to evaluate the performance of deep learning based face representation under several conditions including the varying head pose angles, upper and lower face occlusion, changing illumination of different strengths, and misalignment due to erroneous facial feature localization. Two successful and publicly available deep learning models, namely VGG-Face and Lightened CNN have been utilized to extract face representations. The obtained results show that although deep learning provides a powerful representation for face recognition, it can still benefit from preprocessing, for example, for pose and illumination normalization in order to achieve better performance under various conditions. Particularly, if these variations are not included in the dataset used to train the deep learning model, the role of preprocessing becomes more crucial. Experimental results also show that deep learning based representation is robust to misalignment and can tolerate facial feature localization errors up to 10% of the interocular distance.
Twin-to-Twin Transfusion Syndrome (TTTS) is a progressive pregnancy complication in which inter-twin vascular connections in the shared placenta result in a blood flow imbalance between the twins. The most effective t...
详细信息
ISBN:
(纸本)9781509014378
Twin-to-Twin Transfusion Syndrome (TTTS) is a progressive pregnancy complication in which inter-twin vascular connections in the shared placenta result in a blood flow imbalance between the twins. The most effective therapy is to sever these connections by laser photo-coagulation. However, the limited field of view of the fetoscope hinders their identification. A potential solution is to augment the surgeon's view by creating a mosaic image of the placenta. State-of-the-art mosaicking methods use feature-based approaches, which have three main limitations: (i) they are not robust against corrupt data e.g. blurred frames, (ii) temporal information is not used, (iii) the resulting mosaic suffers from drift. We introduce a probabilistic temporal model that incorporates electromagnetic and visual tracking data to achieve a robust mosaic with reduced drift. By assuming planarity of the imaged object, the nRT decomposition can be used to parametrize the state vector. Finally, we tackle the non-linear nature of the problem in a numerically stable manner by using the Square Root Unscented Kalman Filter. We show an improvement in performance in terms of robustness as well as a reduction of the drift in comparison to state-of-the-art methods in synthetic, phantom and ex vivo datasets.
Recent advances in visual data acquisition and Internet technologies make it convenient and popular to collect and share videos. These activities, however, also raise the issue of privacy invasion. One potential priva...
详细信息
ISBN:
(纸本)9781509014378
Recent advances in visual data acquisition and Internet technologies make it convenient and popular to collect and share videos. These activities, however, also raise the issue of privacy invasion. One potential privacy threat is the unauthorized capture and/or sharing of covert videos, which are recorded without the awareness of the subject(s) in the video. Automatic classification of such videos can provide an important basis toward addressing relevant privacy issues. The task is very challenging due to the large intra-class variation and between-class similarity, since there is no limit in the content of a covert video and it may share very similar content with a regular video. The challenge brings troubles when applying existing content-based video analysis methods to covert video classification. In this paper, we propose a novel descriptor, codebook growing pattern (CGP), which is derived from latent Dirichlet allocation (LDA) over optical flows. Given an input video V, we first represent it with a sequence of histograms of optical flow (HOF). After that, these HOFs are fed into LDA to dynamically generate the codebook for V. The CGP descriptor is then defined as the growing codebook sizes in the LDA procedure. CGP fits naturally for covert video representation since (1) optical flows can capture the camera motion that characterizes the covert video acquisition, and (2) CGP by itself is insensitive to video content. To evaluate the proposed approach, we collected a large covert video dataset, the first such dataset to our knowledge, and tested the proposed method on the dataset. The results show clearly the effectiveness of the proposed approach in comparison with other state-of-the-art video classification algorithms.
Motion blur can adversely affect a number of vision tasks, hence it is generally considered a nuisance. We instead treat motion blur as a useful signal that allows to compute the motion of objects from a single image....
详细信息
ISBN:
(纸本)9781467388511
Motion blur can adversely affect a number of vision tasks, hence it is generally considered a nuisance. We instead treat motion blur as a useful signal that allows to compute the motion of objects from a single image. Drawing on the success of joint segmentation and parametric motion models in the context of optical flow estimation, we propose a parametric object motion model combined with a segmentation mask to exploit localized, non-uniform motion blur. Our parametric image formation model is differentiable w.r.t. the motion parameters, which enables us to generalize marginal-likelihood techniques from uniform blind deblurring to localized, non-uniform blur. A two-stage pipeline, first in derivative space and then in image space, allows to estimate both parametric object motion as well as a motion segmentation from a single image alone. Our experiments demonstrate its ability to cope with very challenging cases of object motion blur.
We propose a new way to train a structured output prediction model. More specifically, we train nonlinear data terms in a Gaussian Conditional Random Field (GCRF) by a generalized version of gradient boosting. The app...
详细信息
ISBN:
(纸本)9781467388511
We propose a new way to train a structured output prediction model. More specifically, we train nonlinear data terms in a Gaussian Conditional Random Field (GCRF) by a generalized version of gradient boosting. The approach is evaluated on three challenging regression benchmarks: vessel detection, single image depth estimation and image inpainting. These experiments suggest that the proposed boosting framework matches or exceeds the state-of-the-art.
Robust and accurate detection of the pupil position is a key building block for head-mounted eye tracking and prerequisite for applications on top, such as gaze-based human-computer interaction or attention analysis. ...
详细信息
Robust and accurate detection of the pupil position is a key building block for head-mounted eye tracking and prerequisite for applications on top, such as gaze-based human-computer interaction or attention analysis. Despite a large body of work, detecting the pupil in images recorded under real-world conditions is challenging given significant variability in the eye appearance (e.g., illumination, reflections, occlusions, etc.), individual differences in eye physiology, as well as other sources of noise, such as contact lenses or make-up. In this paper we review six state-of-the-art pupil detection methods, namely ElSe (Fuhl et al. in Proceedings of the ninth biennial ACM symposium on eye tracking research&applications, ACM. New York, NY, USA, pp 123130, 2016), ExCuSe (Fuhl et al. in computer analysis of images and patterns. Springer, New York, pp 39-51, 2015), Pupil Labs (Kassner et al. in Adjunct proceedings of the 2014 ACM international joint conference on pervasive and ubiquitous computing (UbiComp), pp 1151-1160, 2014. doi: 10.1145/2638728.2641695), SET (Javadi et al. in Front Neuroeng 8, 2015), Starburst (Li et al. in computervision and patternrecognition-workshops, 2005. ieeecomputersocietyconference on cvpr workshops. ieee, pp 79-79, 2005), and Swirski (Swirski et al. in Proceedings of the symposium on eye tracking research and applications (ETRA). ACM, pp 173-176, 2012. doi: 10.1145/2168556.2168585). We compare their performance on a large-scale data set consisting of 225,569 annotated eye images taken from four publicly available data sets. Our experimental results show that the algorithm ElSe (Fuhl et al. 2016) outperforms other pupil detection methods by a large margin, offering thus robust and accurate pupil positions on challenging everyday eye images.
This paper is a reaction to the poor performance of symmetry detection algorithms on real-world images, benchmarked since cvpr 2011. Our systematic study reveals significant difference between human labeled (reflectio...
详细信息
ISBN:
(纸本)9781467388511
This paper is a reaction to the poor performance of symmetry detection algorithms on real-world images, benchmarked since cvpr 2011. Our systematic study reveals significant difference between human labeled (reflection and rotation) symmetries on photos and the output of computervision algorithms on the same photo set. We exploit this human-machine symmetry perception gap by proposing a novel symmetry-based Turing test. By leveraging a comprehensive user interface, we collected more than 78,000 symmetry labels from 400 Amazon Mechanical Turk raters on 1,200 photos from the Microsoft COCO dataset. Using a set of ground-truth symmetries automatically generated from noisy human labels, the effectiveness of our work is evidenced by a separate test where over 96% success rate is achieved. We demonstrate statistically significant outcomes for using symmetry perception as a powerful, alternative, image-based reCAPTCHA.
Given an image of a handwritten word, a CNN is employed to estimate its n-gram frequency profile, which is the set of n-grams contained in the word. Frequencies for unigrams, bigrams and trigrams are estimated for the...
详细信息
ISBN:
(纸本)9781467388511
Given an image of a handwritten word, a CNN is employed to estimate its n-gram frequency profile, which is the set of n-grams contained in the word. Frequencies for unigrams, bigrams and trigrams are estimated for the entire word and for parts of it. Canonical Correlation Analysis is then used to match the estimated profile to the true profiles of all words in a large dictionary. The CNN that is used employs several novelties such as the use of multiple fully connected branches. Applied to all commonly used handwriting recognition benchmarks, our method outperforms, by a very large margin, all existing methods.
This paper deals with the extraction of multiple models from noisy or outlier-contaminated data. We cast the multi-model fitting problem in terms of set coverage, deriving a simple and effective method that generalize...
详细信息
ISBN:
(纸本)9781467388511
This paper deals with the extraction of multiple models from noisy or outlier-contaminated data. We cast the multi-model fitting problem in terms of set coverage, deriving a simple and effective method that generalizes Ransac to multiple models and deals with intersecting structures and outliers in a straightforward and principled manner, while avoiding the typical shortcomings of sequential approaches and those of clustering. The method compares favorably against the state-of-the-art on simulated and publicly available real data-sets.
暂无评论