We propose a new Hidden Markov Model with time-dependent states. Estimation of this model is shown to be as fast and easy as the estimation of regular HMMs. We demonstrate the usefulness and feasibility of such time-d...
详细信息
We propose a new Hidden Markov Model with time-dependent states. Estimation of this model is shown to be as fast and easy as the estimation of regular HMMs. We demonstrate the usefulness and feasibility of such time-dependent HMMs with an application in which illegitimate access to personnel-only rooms in airports etc. can be distinguished from access by legitimate personnel, based on differences in the time of access or differences in the motion trajectories.
In this paper, we describe the use of Genetic Programming (GP) techniques to learn a visual feature detection for a mobile robot navigation task. We provide experimental results across a number of different environmen...
详细信息
In this paper, we describe the use of Genetic Programming (GP) techniques to learn a visual feature detection for a mobile robot navigation task. We provide experimental results across a number of different environments, each with different characteristics, and draw conclusions about the performance of the learned feature detector. We also explore the utility of seeding the initial population with a previously evolved individual, and discuss the performance of the resulting individuals.
The problem of object recognition is addressed. In the literature this task has been generally considered in a "passive" perspective, where everything is static and there is no definite relation between the ...
详细信息
The problem of object recognition is addressed. In the literature this task has been generally considered in a "passive" perspective, where everything is static and there is no definite relation between the object and its environment. We propose an "active" approach for object recognition, based on the capability of the observer to move and give a better description of the object under consideration and also to take advantage of the relations between the objects and the environment. This can be accomplished at the task level and at the sensor level. The face recognition problem, based on the face-space approach, is considered to demonstrate the advantage of adopting an active retina to sample the face, build a database and perform the recognition task. By using an active space-variant retina the size of the database is considerably reduced and consequently the processing time for recognition. A comparative experiment using the active and static approach is presented.< >
In this paper, we address the problem of object class recognition via observations from actively selected views/modalities/features under limited resource budgets. A Partially Observable Markov Decision Process (POMDP...
详细信息
In this paper, we address the problem of object class recognition via observations from actively selected views/modalities/features under limited resource budgets. A Partially Observable Markov Decision Process (POMDP) is employed to find optimal sensing and recognition actions with the goal of long-term classification accuracy. Heterogeneous resource constraints -- such as motion, number of measurements and bandwidth -- are explicitly modeled in the state variable, and a prohibitively high penalty is used to prevent the violation of any resource constraint. To improve recognition performance, we further incorporate discriminative classification models with POMDP, and customize the reward function and observation model correspondingly. The proposed model is validated on several data sets for multi-view, multi-modal vehicle classification and multi-view face recognition, and demonstrates improvement in both recognition and resource management over greedy methods and previous POMDP formulations.
Object tracking methods based on stereo cameras, which provide both color and depth data at each pixel, find advantage in separating objects from each other and from background, determining the 3D size and location of...
详细信息
Object tracking methods based on stereo cameras, which provide both color and depth data at each pixel, find advantage in separating objects from each other and from background, determining the 3D size and location of objects, and modeling object shape. However, stereo tracking methods to date sometimes fail due to depth image noise, and discard much useful appearance information. We propose augmenting stereo-based models of tracked objects with sparse local appearance features, which have recently been applied with great success to object recognition under pose variation and partial occlusion. Depth data complements sparse local features by informing correct assignment of features to objects, while tracking of stable local appearance features helps overcome distortion of object shape models due to depth noise and partial occlusion. To speed up tracking of many local features, we also use a ldquobinary Gaborrdquo representation that is highly descriptive yet efficiently computed using integral images. In addition, a novel online feature selection and pruning technique is described to focus tracking onto the best localized and most consistent features. A tracking framework fusing all of these aspects is provided, and results for challenging video sequences are discussed.
In this paper, we address the problem of human pose estimation through a novel articulated Gaussian kernel correlation function which is applied to human pose tracking from a single depth sensor. We first derive a uni...
详细信息
ISBN:
(纸本)9781467367608
In this paper, we address the problem of human pose estimation through a novel articulated Gaussian kernel correlation function which is applied to human pose tracking from a single depth sensor. We first derive a unified Gaussian kernel correlation that can generalize the previous Sum-of-Gaussians (SoG)-based methods for the similarity measure between a template and the observation. Furthermore, we develop an articulated Gaussian kernel correlation by embedding a tree-structured skeleton model, which enables us to estimate the full-body pose parameters. Also, the new kernel correlation framework can easily penalize undesired body intersection which is more natural than the clamping function in previous methods. Our algorithm is general, simple yet effective and can achieve real-time performance. The experimental results on a public depth dataset are promising and competitive when compared with state-of-the-art algorithms.
The past few years have seen improvements in the frame rate, accuracy, cost, and size of 3D sensors, including systems based on stereo vision, time-of-flight, structured light, and depth-from-X methods. These exciting...
The past few years have seen improvements in the frame rate, accuracy, cost, and size of 3D sensors, including systems based on stereo vision, time-of-flight, structured light, and depth-from-X methods. These exciting developments have sparked increasing interest in industry and academia in the challenges and opportunities afforded by real-time 3D sensing. Applications such as tracking, recognition, and scene understanding may become more robust and more feasible thanks to these sensors.
Various facial region biometrics have been used extensively in the areas of recognition and authentication. However, some regions of the face provide more information than is currently being fully utilized in these sp...
详细信息
Various facial region biometrics have been used extensively in the areas of recognition and authentication. However, some regions of the face provide more information than is currently being fully utilized in these specific capacities. Biometrics associated exclusively with the eye region hold a key to identifying and classifying particular affective and cognitive states. This paper focuses on 1) methods for identifying and deriving the appropriate biometric data inherent to the eye region that is most useful in specific HCI scenarios and, 2) outlining a framework for classification of these biometric data into affective and cognitive states relative to a particular HCI context.
We address the problem of both estimating the dominant person in a meeting from a single audio source and identifying them visually in a multi-camera setting. We use a speaker diarization algorithm to perform speaker ...
详细信息
We address the problem of both estimating the dominant person in a meeting from a single audio source and identifying them visually in a multi-camera setting. We use a speaker diarization algorithm to perform speaker segmentation and clustering, representing when they spoke. Using a greedy ordered audio-visual association algorithm, we investigate using the speaker clusters to find the corresponding person in one of the video channels. The difficulty of the problem is that firstly the speaker diarization output is noisy (e.g. for participants who speak little) and often produces an unequal number of clusters to true participants. Secondly, personal visual activity from natural upper torso motion, which can include highly deformable pose changes and perspective distortion, is computed through computationally efficient coarse features. Our results using almost 2 hours of audio-visual data from 4-participant meetings show a strong correlation between the estimated speaker diarization and visual activity features, enabling the identification of the most dominant person as a pair of audiovisual channels.
In recent years, the influences of design patterns on software quality have attracted increasing attention in the area of software engineering, as design patterns encapsulate valuable knowledge to resolve design probl...
详细信息
暂无评论