Face recognition under viewpoint and illumination changes is a difficult problem, so many researchers have tried to solve this problem by producing the pose- and illumination- invariant feature. Zhu et al. [26] change...
详细信息
ISBN:
(纸本)9781467369640
Face recognition under viewpoint and illumination changes is a difficult problem, so many researchers have tried to solve this problem by producing the pose- and illumination- invariant feature. Zhu et al. [26] changed all arbitrary pose and illumination images to the frontal view image to use for the invariant feature. In this scheme, preserving identity while rotating pose image is a crucial issue. This paper proposes a new deep architecture based on a novel type of multitask learning, which can achieve superior performance in rotating to a target-pose face image from an arbitrary pose and illumination image while preserving identity. The target pose can be controlled by the user's intention. This novel type of multi-task model significantly improves identity preservation over the single task model. By using all the synthesized controlled pose images, called Controlled Pose Image (CPI), for the poseillumination- invariant feature and voting among the multiple face recognition results, we clearly outperform the state-of-the-art algorithms by more than 4 similar to 6% on the MultiPIE dataset.
Pose variation remains one of the major factors adversely affect the accuracy of real-world face recognition systems. Inspired by the recently proposed probabilistic elastic part (PEP) model and the success of the dee...
详细信息
ISBN:
(纸本)9781467369640
Pose variation remains one of the major factors adversely affect the accuracy of real-world face recognition systems. Inspired by the recently proposed probabilistic elastic part (PEP) model and the success of the deep hierarchical architecture in a number of visual tasks, we propose the Hierarchical-PEP model to approach the unconstrained face recognition problem. We apply the PEP model hierarchically to decompose a face image into face parts at different levels of details to build pose-invariant part-based face representations. Following the hierarchy from bottom-up, we stack the face part representations at each layer, discriminatively reduce its dimensionality, and hence aggregate the face part representations layer-by-layer to build a compact and invariant face representation. The Hierarchical-PEP model exploits the fine-grained structures of the face parts at different levels of details to address the pose variations. It is also guided by supervised information in constructing the face part/face representations. We empirically verify the Hierarchical-PEP model on two public benchmarks (i.e., the LFW and YouTube Faces) and a face recognition challenge (i.e., the PaSC grand challenge) for image-based and video-based face verification. The state-of-the-art performance demonstrates the potential of our method.
The success of an intelligent robotic system depends on the performance of its vision-system which in turn depends to a great extend upon the quality of its calibration. During the execution of a task the vision-syste...
详细信息
ISBN:
(纸本)0780342364
The success of an intelligent robotic system depends on the performance of its vision-system which in turn depends to a great extend upon the quality of its calibration. During the execution of a task the vision-system is subject to external influences such as vibrations, thermal expansion etc. which affect and possibly render invalid the initial calibration. Moreover it is possible that the parameters of the vision-system like e.g. the zoom or the focus are altered intentionally in order to perform specific vision-tasks. This paper describes a technique for automatically maintaining calibration of stereovision systems over time without using again any particular calibration apparatus. It uses all available information, i.e. both spatial and temporal data. Uncertainty is systematically manipulated and maintained. Synthetical and real data are used to validate the proposed technique, and the results compare very favourably with those given by classical calibration methods.
This paper concerns action recognition from unseen and unknown views. We propose unsupervised learning of a non-linear model that transfers knowledge from multiple views to a canonical view. The proposed Non-linear Kn...
详细信息
ISBN:
(纸本)9781467369640
This paper concerns action recognition from unseen and unknown views. We propose unsupervised learning of a non-linear model that transfers knowledge from multiple views to a canonical view. The proposed Non-linear Knowledge Transfer Model (NKTM) is a deep network, with weight decay and sparsity constraints, which finds a shared high-level virtual path from videos captured from different unknown viewpoints to the same canonical view. The strength of our technique is that we learn a single NKTM for all actions and all camera viewing directions. Thus, NKTM does not require action labels during learning and knowledge of the camera viewpoints during training or testing. NKTM is learned once only from dense trajectories of synthetic points fitted to mocap data and then applied to real video data. Trajectories are coded with a general codebook learned from the same mocap data. NKTM is scalable to new action classes and training data as it does not require re-learning. Experiments on the IX MAS and N-UCLA datasets show that NKTM outperforms existing state-of-the-art methods for cross-view action recognition.
In this contribution we present an algorithm for tracking non-rigid, moving objects in a sequence of colored images, which were recorded by a non-stationary camera. The application background is vision-based driving a...
详细信息
ISBN:
(纸本)0780342364
In this contribution we present an algorithm for tracking non-rigid, moving objects in a sequence of colored images, which were recorded by a non-stationary camera. The application background is vision-based driving assistance in the inner city In an initial step, object parts are determined by a divisive clustering algorithm, which is applied to all pixels in the first image of the sequence. The feature space is defined by the color and position of a pixel. For each new image the clusters of the previous image are adapted iteratively by a parallel k-means clustering algorithm. Instead of tracking single points, edges, or areas over a sequence of images, only the centroids of the clusters are tracked. The proposed method remarkably simplifies the correspondence problem and also ensures a robust tracking behavior.
During a fixed axis camera rotation every image point is moving on a conic section. If the point is a vanishing point the conic section is invariant to possible translations of the observer. Given the rotation axis an...
详细信息
ISBN:
(纸本)0818672587
During a fixed axis camera rotation every image point is moving on a conic section. If the point is a vanishing point the conic section is invariant to possible translations of the observer. Given the rotation axis and the inter-frame correspondence of a set of parallel lines we are able to compute the intrinsic parameters without knowledge of the rotation angles. We propagate the error covariances and we remove the bias in the computation of the conic. We experimentally study the sensitivity of calibration to the amount of rotation and we compare our performance to the performance of a recent active calibration technique.
We propose a framework that performs action recognition and identity maintenance of multiple targets simultaneously. Instead of first establishing tracks using an appearance model and then performing action recognitio...
详细信息
ISBN:
(纸本)9781467312288
We propose a framework that performs action recognition and identity maintenance of multiple targets simultaneously. Instead of first establishing tracks using an appearance model and then performing action recognition, we construct a network flow-based model that links detected bounding boxes across video frames while inferring activities, thus integrating identity maintenance and action recognition. Inference in our model reduces to a constrained minimum cost flow problem, which we solve exactly and efficiently. By leveraging both appearance similarity and action transition likelihoods, our model improves on state-of-the-art results on action recognition for two datasets.
Given an image of a handwritten word, a CNN is employed to estimate its n-gram frequency profile, which is the set of n-grams contained in the word. Frequencies for unigrams, bigrams and trigrams are estimated for the...
详细信息
ISBN:
(纸本)9781467388511
Given an image of a handwritten word, a CNN is employed to estimate its n-gram frequency profile, which is the set of n-grams contained in the word. Frequencies for unigrams, bigrams and trigrams are estimated for the entire word and for parts of it. Canonical Correlation Analysis is then used to match the estimated profile to the true profiles of all words in a large dictionary. The CNN that is used employs several novelties such as the use of multiple fully connected branches. Applied to all commonly used handwriting recognition benchmarks, our method outperforms, by a very large margin, all existing methods.
Many vision tasks require a multi-class classifier to discriminate multiple categories, on the order of hundreds or thousands. In this paper, we propose sparse output coding, a principled way for large-scale multi-cla...
详细信息
ISBN:
(纸本)9780769549897
Many vision tasks require a multi-class classifier to discriminate multiple categories, on the order of hundreds or thousands. In this paper, we propose sparse output coding, a principled way for large-scale multi-class classification, by turning high-cardinality multi-class categorization into a bit-by-bit decoding problem. Specifically, sparse output coding is composed of two steps: efficient coding matrix learning with scalability to thousands of classes, and probabilistic decoding. Empirical results on object recognition and scene classification demonstrate the effectiveness of our proposed approach.
One of the central problems in stereo matching (and other image registration tasks) is the selection of optimal window sizes for comparing image regions. This paper addresses this problem with some novel algorithms ba...
详细信息
ISBN:
(纸本)0818672587
One of the central problems in stereo matching (and other image registration tasks) is the selection of optimal window sizes for comparing image regions. This paper addresses this problem with some novel algorithms based on iteratively diffusing support at different disparity hypotheses, and locally controlling the amount of diffusion based on the current quality of the disparity estimate. It also develops a novel Bayesian estimation technique which significantly outperforms techniques based on area-based matching (SSD) and regular diffusion. We provide experimental results on both synthetic and real stereo image pairs.
暂无评论