Proposed is a pixel-pattern-based texture feature (PPBTF) for realtime facial expression recognition. Grey-scale images are transformed into pattern maps where edges and lines are used for characterising facial textur...
详细信息
Proposed is a pixel-pattern-based texture feature (PPBTF) for realtime facial expression recognition. Grey-scale images are transformed into pattern maps where edges and lines are used for characterising facial texture. Based on the pattern map, a feature vector is constructed. Adaboost and a support vector machine (SVM) are adopted. Experiments on the Cohn-Kanade database illustrate that the PPBTF is effective and efficient for facial expression recognition.
In this paper, we present a probabilistic formulation of kernel-based tracking methods based upon maximum likelihood estimation. To this end, we view the coordinates for the pixels in both, the target model and its ca...
详细信息
ISBN:
(纸本)9781424411795
In this paper, we present a probabilistic formulation of kernel-based tracking methods based upon maximum likelihood estimation. To this end, we view the coordinates for the pixels in both, the target model and its candidate as random variables and make use of a generative model so as to cast the tracking task into a maximum likelihood framework. This, in turn, permits the use of the EM-algorithm to estimate a set of latent variables that can be used to update the target-center position. Once the latent variables have been estimated, we use the Kullback-Leibler divergence so as to minimise the mutual information between the target model and candidate distributions in order to develop a target-center update rule and a kernel bandwidth adjustment scheme. The method is very general in nature. We illustrate the utility of our approach for purposes of tracking on real-world video sequences using two alternative kernel functions.
We describe a probabilistic framework for recognizing human activities in monocular video based on simple silhouette observations in this paper. The methodology combines kernel principal component analysis (KPCA) base...
详细信息
ISBN:
(纸本)9781424411795
We describe a probabilistic framework for recognizing human activities in monocular video based on simple silhouette observations in this paper. The methodology combines kernel principal component analysis (KPCA) based feature extraction and factorial conditional random field (FCRF) based motion modeling. Silhouette data is represented more compactly by nonlinear dimensionality reduction that explores the underlying structure of the articulated action space and preserves explicit temporal orders in projection trajectories of motions. FCRF models temporal sequences in multiple interacting ways, thus increasing joint accuracy by information sharing, with the ideal advantages of discriminative models over generative ones (e.g., relaxing independence assumption between observations and the ability to effectively incorporate both overlapping features and long-range dependencies). The experimental results on two recent datasets have shown that the proposed framework can not only accurately recognize human activities with temporal, intra- and inter-person variations, but also is considerably robust to noise and other factors such as partial occlusion and irregularities in motion styles.
The goal of interest point detectors is to find in an unsupervised way, keypoints easy to extract and the same timerobust to image transformation. We present a novel set of saliency features based on image singulariti...
详细信息
ISBN:
(纸本)9781424411795
The goal of interest point detectors is to find in an unsupervised way, keypoints easy to extract and the same timerobust to image transformation. We present a novel set of saliency features based on image singularities that takes into account the region content in terms of intensity and local structure. The region complexity is estimated by means of the entropy of the grey-level information shape information is obtained by measuring the entropy of significant orientations. The regions are located in their representative scale and catergorized by their complexity level. Thus the regions are highly discriminable and less sensitive to confusion and false alarm than raditional approaches. We compare the novel complex salient regions with the state of the art keypoint detectors. The presented interest points show robusthness to a wide set of image transformations and high respecability as well as allows matching from different camera point of view. Besides we show the temporal robusiness of the novel salient regions in real video sequences being potentially useful for matching image retriveal and object categorization problems.
Gait is a promising biometric cue which can facilitate the recognition of human beings, particularly when other biometrics are unavailable. Existing work for gait recognition, however, lays more emphasis on the proble...
详细信息
ISBN:
(纸本)9781424411795
Gait is a promising biometric cue which can facilitate the recognition of human beings, particularly when other biometrics are unavailable. Existing work for gait recognition, however, lays more emphasis on the problem of daytime walker recognition and overlooks the significance of walker recognition at night. This paper deals with the problem of recognizing nighttime walkers. We take advantage of infrared gait patterns to accomplish this task: 1) Walker detection is improved using intensity compensation-based background subtraction;2)pseudoshape-based features are proposed to describe gait patterns;3) the dimension of gait features is reduced through the principal component analysis (PCA) and linear discriminant analysis (LDA) techniques;4) temporal cues are exploited in the form of the relevant component analysis (RCA) learning;5) the nearest neighbor classifier is used to recognize unknown gait. Experimental results justify the effectiveness of our method and show that our method has an encouraging potential for the application in surveillance systems.
Wide baseline stereo correspondence has become a challenging and attractive problem in computervision and its related applications. Getting high correct ratio initial matches is a very important step of general wide ...
详细信息
ISBN:
(纸本)9780819469526
Wide baseline stereo correspondence has become a challenging and attractive problem in computervision and its related applications. Getting high correct ratio initial matches is a very important step of general wide baseline stereo correspondence algorithm. Ferrari et al. suggested a voting scheme called topological filter in [3] to discard mismatches from initial matches, but they didn't give theoretical analysis of their method. Furthermore, the parameter of their scheme was uncertain. In this paper, we improved Ferraris' method based on our theoretical analysis, and presented a novel scheme called topologically clustering to discard mismatches. The proposed method has been tested using many famous wide baseline image pairs and the experimental results showed that the developed method can efficiently extract high correct ratio matches from low correct ratio initial matches for wide baseline image pairs.
A new method for localising and recognising hand poses and objects in real-time is presented. This problem is important in vision-driven applications where it is natural for a user to combine hand gestures and real ob...
详细信息
ISBN:
(纸本)9781424411795
A new method for localising and recognising hand poses and objects in real-time is presented. This problem is important in vision-driven applications where it is natural for a user to combine hand gestures and real objects when interacting with a machine. Examples include using a real eraser to remove words from a document displayed on an electronic surface. In this paper the task of simultaneously recognising object classes, hand gestures and detecting touch events is cast as a single classification problem. A random forest algorithm is employed which adaptively selects and combines a minimal set of appearance, shape and stereo features to achieve maximum class discrimination for a given image. This minimal set leads to both efficiency at run time and good generalisation. Unlike previous stereo works which explicitly construct disparity maps, here the stereo matching costs are used directly as visual cue and only computed on-demand, i.e. only for pixels where they are necessary for recognition. This leads to improved efficiency. The proposed method is assessed on a database of a variety of objects and hand poses selected for interacting on a flat surface in an office environment.
Monocular vision is widely used in mobile robot's motion control for its simple structure and easy using. An integrated description to distinguish and tracking the specified color targets dynamically and precisely...
详细信息
ISBN:
(纸本)9780819469502
Monocular vision is widely used in mobile robot's motion control for its simple structure and easy using. An integrated description to distinguish and tracking the specified color targets dynamically and precisely by the Monocular vision based on the imaging principle is the major topic of the paper. The mainline is accordance with the mechanisms of visual processing strictly, including the pretreatment and recognition processes. Specially, the color models are utilized to decrease the influence of the illumination in the paper. Some applied algorithms based on the practical application are used for image segmentation and clustering. After recognizing the target, however the monocular camera can't get depth information directly, 3D Reconstruction Principle is used to calculate the distance and direction from robot to target. To emend monocular camera reading, the laser is used after vision measuring. At last, a vision servo system is designed to realize the robot's dynamic tracking to the moving target.
Local image descriptors have proved themselves as useful tools for many computervision tasks such as matching points between multiple images of a scene and object recognition. Current descriptors, such as SIFT, are d...
详细信息
ISBN:
(纸本)9781424411795
Local image descriptors have proved themselves as useful tools for many computervision tasks such as matching points between multiple images of a scene and object recognition. Current descriptors, such as SIFT, are designed to match image features with unique local neighborhoods. However, the interest point detectors used with SIFT often fail to select perceptible local structures in the image, and the SIFT descriptor does not directly encode the local neighborhood shape. In this paper we propose a symmetry based interest point detector and radial local structure descriptor which consistently captures the majority of basic local image structures and provides a geometrical description of the structure boundaries. This approach concentrates on the extraction of shape properties in image patches, which are an intuitive way to represent local appearance for matching and classification. We explore the specificity and sensitivity of this local descriptor in the context of classification of natural patterns. The implications of the performance comparison with standard approaches like SIFT are discussed.
We present a computervision system for robust object tracking in 3D by combining evidence from multiple calibrated cameras. This kernel-based 3D tracker is automatically bootstrapped by constructing 3D point clouds. ...
详细信息
ISBN:
(纸本)9781424411795
We present a computervision system for robust object tracking in 3D by combining evidence from multiple calibrated cameras. This kernel-based 3D tracker is automatically bootstrapped by constructing 3D point clouds. These points clouds are then clustered and used to initialize the trackers and validate their performance. The framework describes a complete tracking system that fuses appearance features from all available camera sensors and is capable of automatic initialization and drift detection. Its elegance resides in its inherent ability to handle problems encountered by various 2D trackers, including scale selection, occlusion, view-dependence, and correspondence across views. Tracking results for an indoor smart room and a multi-camera outdoor surveillance scenario are presented We demonstrate the effectiveness of this unified approach by comparing its performance to a baseline 3D tracker that fuses results of independent 2D trackers, as well as comparing the re- initialization results to known ground truth.
暂无评论