This paper introduces a fully-automated, unsupervised method to recognise sign from subtitles. It does this by using data mining to align correspondences in sections of videos. Based on head and hand tracking, a novel...
详细信息
ISBN:
(纸本)9781424439928
This paper introduces a fully-automated, unsupervised method to recognise sign from subtitles. It does this by using data mining to align correspondences in sections of videos. Based on head and hand tracking, a novel temporally constrained adaptation of apriori mining is used to extract similar regions of video, with the aid of a proposed contextual negative selection method. These regions are refined in the temporal domain to isolate the occurrences of similar signs in each example. The system is shown to automatically identify and segment signs from standard news broadcasts containing a variety of topics.
We propose an adaptive and effective multimodal peripheral-fovea sensor design for real-time targets tracking. This design is inspired by the biological vision systems for achieving real-time target detection and reco...
详细信息
ISBN:
(纸本)9781424439942
We propose an adaptive and effective multimodal peripheral-fovea sensor design for real-time targets tracking. This design is inspired by the biological vision systems for achieving real-time target detection and recognition with a hyperspectral/range fovea and panoramic peripheral view. A realistic scene simulation approach is used to evaluate our sensor design and the related data exploitation algorithms before a real sensor is made. The goal is to reduce development time and system cost while achieving optimal results through an iterative process that incorporates simulation, sensing, processing and evaluation. Important issues such as multimodal sensory component integration, region of interest extraction, target tracking, hyperspectral image analysis and target signature identification are discussed.
This paper addresses the problem of developing facial image quality metrics that are predictive of the performance of existing biometric matching algorithms and incorporating the quality estimates into the recognition...
详细信息
ISBN:
(纸本)9781424439942
This paper addresses the problem of developing facial image quality metrics that are predictive of the performance of existing biometric matching algorithms and incorporating the quality estimates into the recognition decision process to improve overall performance. The first task we consider is the separation of probe/gallery qualities since the match score depends on both. Given a set of training images of the same individual, we find the match scores between all possible probe/gallery image pairs. Then, we define symmetric normalized match score for any pair, model it as the average of the qualities of probe/gallery corrupted by additive noise, and estimate the quality values such that the noise is minimized. To utilize quality in the decision process, we employ a Bayesian network to model the relationships among qualities, predefined quality related image features and recognition. The recognition decision is made by probabilistic inference via this model. We illustrate with various face verification experiments that incorporating quality into the decision process can improve the performance significantly.
Facial action provides various types of messages for human communications. Recognizing spontaneous facial actions, however is very challenging due to subtle facial deformation, frequent head movements, and ambiguous a...
详细信息
ISBN:
(纸本)9781424439942
Facial action provides various types of messages for human communications. Recognizing spontaneous facial actions, however is very challenging due to subtle facial deformation, frequent head movements, and ambiguous and uncertain facial motion measurements. As a result, current research in facial action recognition is limited to posed facial actions and often in frontal view. Spontaneous facial action is characterized by rigid head movements and nonrigid facial muscular movements. More importantly, it is the spatiotemporal interactions among the rigid and nonrigid facial motions that produce a meaningful and natural facial displays Recognizing this fact, we introduce a probabilistic facial action model based on a dynamic Bayesian network (DBN) to simultaneously and coherently capture rigid and nonrigid facial motions, their spatiotemporal dependencies, and their image measurements. Advanced machine learning methods are introduced to learn the probabilistic facial action model based on both training data and prior knowledge. Facial action recognition is accomplished through probabilistic inference by systemically integrating measurements of facial motions with the facial action model. Experiments show that the proposed system yields significant improvements in recognizing spontaneous facial actions.
Efficient view registration with respect to a given 3D reconstruction has many applications like inside-out tracking in indoor and outdoor environments, and geo-locating images from large photo collections. We present...
详细信息
ISBN:
(纸本)9781424439928
Efficient view registration with respect to a given 3D reconstruction has many applications like inside-out tracking in indoor and outdoor environments, and geo-locating images from large photo collections. We present a fast location recognition technique based on structure from motion point clouds. Vocabulary tree-based indexing of features directly returns relevant fragments of 3D models instead of documents from the images database. Additionally, we propose a compressed 3D scene representation which improves recognition rates while simultaneously reducing the computation time and the memory consumption. The design of our method is based on algorithms that efficiently utilize modern graphics processing units to deliver real-time performance for view registration. We demonstrate the approach by matching hand-held outdoor videos to known 3D urban models, and by registering images from online photo collections to the corresponding landmarks.
This paper addresses two critical but rarely concerned issues in 2D face recognition: wider-range tolerance to pose variation and misalignment. We propose a new Textural Hausdorff Distance (THD), which is a compound m...
详细信息
ISBN:
(纸本)9781424439928
This paper addresses two critical but rarely concerned issues in 2D face recognition: wider-range tolerance to pose variation and misalignment. We propose a new Textural Hausdorff Distance (THD), which is a compound measurement integrating both spatial and textural features. The THD is applied to a Significant Jet Point (SJP) representation of face images, where a varied number of shape-driven SJPs are detected automatically from low-level edge map with rich information content. The comparative experiments conducted on publicly available FERET and AR face databases demonstrated that the proposed approach has a considerably wider range of tolerance against both in-depth head rotation and face misalignment.
In this paper we address the problem of localisation and recognition of human activities in unsegmented image sequences. The main contribution of the proposed method is the use of an implicit representation of the spa...
详细信息
ISBN:
(纸本)9781424439942
In this paper we address the problem of localisation and recognition of human activities in unsegmented image sequences. The main contribution of the proposed method is the use of an implicit representation of the spatiotemporal shape of the activity which relies on the spatiotemporal localization of characteristic, sparse, 'visual words' and 'visual verbs'. Evidence for the spatiotemporal localization of the activity are accumulated in a probabilistic spatiotemporal voting scheme. The local nature of our voting framework allows us to recover multiple activities that take place in the same scene, as well as activities in the presence of clutter and occlusions. We construct class-specific codebooks using the descriptors in the training set, where we take the spatial co-occurrences of pairs of codewords into account. The positions of the codeword pairs with respect to the object centre, as well as the frame in the training set in which they occur are subsequently stored in order to create a spatiotemporal model of codeword co-occurrences. During the testing phase, we use Mean Shift Mode estimation in order to spatially segment the subject that performs the activities in every frame, and the Radon transform in order to extract the most probable hypotheses concerning the temporal segmentation of the activities within the continuous stream.
We present a general technique for rectification of a stereo pair acquired by a calibrated omnidirectional camera. Using this technique we formulate a new stereographic rectification method. Our rectification does not...
详细信息
ISBN:
(纸本)9781424439928
We present a general technique for rectification of a stereo pair acquired by a calibrated omnidirectional camera. Using this technique we formulate a new stereographic rectification method. Our rectification does not map epipolar curves onto lines as common rectification methods, but rather maps epipolar curves onto circles. We show that this rectification in a certain sense minimizes the distortion of the original omnidirectional images. We formulate the rectification for multiple images and show that the choice of the optimal projection center of the rectification is under certain circumstances equivalent to the classical problem of spherical minimax location. We demonstrate the behaviour and the quality of the rectification in real experiments with images from 180 degree field of view fish eye lenses.
This paper presents a method that considers not only patch appearances, but also patch relationships in the form of adjectives and prepositions for natural scene recognition. Most of the existing scene categorization ...
详细信息
ISBN:
(纸本)9781424439942
This paper presents a method that considers not only patch appearances, but also patch relationships in the form of adjectives and prepositions for natural scene recognition. Most of the existing scene categorization approaches only use patch appearances or co-occurrence of patch appearances to determine the scene categories, but the relationships among patches remain ignored. Those relationships are, however, critical for recognition and understanding. For example, a 'beach' scene can be characterized by a 'sky' region above 'sand', and a 'water' region between 'sky' and 'sand'. We believe that exploiting such relations between image regions can improve scene recognition. In our approach, each image is represented as a spatial pyramid, from which we obtain a collection of patch appearances with spatial layout information. We apply a feature mining approach to get discriminative patch combinations. The mined patch combinations can be interpreted as adjectives or prepositions, which are used for scene understanding and recognition. Experimental results on a fifteen class scene dataset show that our approach achieves competitive state-of-the-art recognition accuracy, while providing a rich description of the scene classes in terms of the mined adjectives and prepositions.
We present a system that combines multiple visual navigation techniques to achieve GPS-denied, non-line-of-sight SLAM capability for heterogeneous platforms. Our approach builds on several layers of vision algorithms,...
详细信息
ISBN:
(纸本)9781424439928
We present a system that combines multiple visual navigation techniques to achieve GPS-denied, non-line-of-sight SLAM capability for heterogeneous platforms. Our approach builds on several layers of vision algorithms, including sparse frame-to-frame structure from motion (visual odometry), a Kalman filter for fusion with inertial measurement unit (IMU) data and a distributed visual landmark matching capability with geometric consistency verification. We apply these techniques to implement a tag-along robot, where a human operator leads the way and a robot autonomously follows. We show results for a real-time implementation of such a system with real field constraints on CPU power and network resources.
暂无评论