Temporal segmentation of human motion into actions is central to the understanding and building of computational models of human motion and activity recognition. Several issues contribute to the challenge of temporal ...
详细信息
ISBN:
(纸本)9781424439942
Temporal segmentation of human motion into actions is central to the understanding and building of computational models of human motion and activity recognition. Several issues contribute to the challenge of temporal segmentation and classification of human motion. These include the large variability in the temporal scale and Periodicity of human actions, the complexity of representing articulated motion, and the exponential nature of all possible movement combinations. We provide initial results from investigating two distinct problems - classification of the overall task being performed, and the more difficult problem of classifying individual frames over time into specific actions. We explore first-person sensing through a wearable camera and Inertial Measurement Units (IMUs)for temporally segmenting human motion into actions and performing activity classification in the context of cooking and recipe preparation in a natural environment. We present baseline results for supervised and unsupervised temporal segmentation, and recipe recognition in the CMU-Multimodal activity database (CMU-MMAC).
We demonstrate that is it possible to automatically find representative example images of a specified object category These canonical examples are perhaps the kind of images that one would show a child to teach them w...
详细信息
ISBN:
(纸本)9781424439942
We demonstrate that is it possible to automatically find representative example images of a specified object category These canonical examples are perhaps the kind of images that one would show a child to teach them what, for example a horse is - images with a large object clearly separated from the background. Given a large collection of images returned by a web search for an object category, our approach proceeds without an), user supplied training data for the category. First images are ranked according to a category independent composition model that predicts whether the), contain a large clearly depicted object, and outputs an estimated location of that object. Then local features calculated on the proposed object regions are used to eliminate images not distinctive to the category, and to cluster images by similarity of object appearance. We present results and a user evaluation on a variety of object categories, demonstrating the effectiveness of the approach.
To address the challenges of non-cooperative, large-distance human signature defection, we present a novel multimodal remote audio/video acquisition system. The system mainly consists of a laser Doppler virbometer (LD...
详细信息
ISBN:
(纸本)9781424439942
To address the challenges of non-cooperative, large-distance human signature defection, we present a novel multimodal remote audio/video acquisition system. The system mainly consists of a laser Doppler virbometer (LDV) and a pan-tilt-zoom (PTZ) camera. The LDV is a unique remote hearing sensor that uses the principle of laser interferometry. However, it needs an appropriate surface to modulate the speech of a human subject and reflect the laser beam to the LDV receiver. The manual operation to turn the laser beam onto a target is very difficult at a distance of more than 20 meters. Therefore, the PTZ camera is used to capture the video of the human subject, track the subject when he/she moves, and analyze the image to get a good reflection surface for LDV measurements in real-time. Experiments show that the integration of those two sensory components is ideal for multimodal human signature detection at a large distance.
An algorithm is proposed for the 3D modeling of static scenes solely based on the range and intensity data acquired by a Time-of-Flight camera during an arbitrary movement. No additional scene acquisition devices, lik...
详细信息
ISBN:
(纸本)9781424439942
An algorithm is proposed for the 3D modeling of static scenes solely based on the range and intensity data acquired by a Time-of-Flight camera during an arbitrary movement. No additional scene acquisition devices, like inertia sensor, positioning robots or intensity based cameras are incorporated. The current pose is estimated by maximizing the uncentered correlation coefficient between edges detected in the current and a preceding frame at a minimum frame rate of four fps and an average accuracy of 45 mm. The paper also describes several extensions for robust registration like multiresolution hierarchies and projection Iterative Closest Point algorithm. The basic registration algorithm and its extensions were intensively evaluated against ground truth data to validate the accuracy, robustness and real-time-capability.
The number of digital images that needs to be acquired, analyzed, classified, stored and retrieved in the medical centers is exponentially growing with the advances in medical imaging technologic Accordingly medical i...
详细信息
ISBN:
(纸本)9781424439942
The number of digital images that needs to be acquired, analyzed, classified, stored and retrieved in the medical centers is exponentially growing with the advances in medical imaging technologic Accordingly medical image classification and retrieval has become a popular topic in the recent years. Despite many projects,focusing on this problem, proposed solutions are still far from being sufficiently accurate for real-life implementations. Interpreting medical image classification and retrieval as a multi-class classification task, in this work, we investigate the performance of five different feature types in a SVM-based learning framework-for classification of human body X-Ray images into classes corresponding to body parts. Our comprehensive experiments,show that four conventional feature types provide performances comparable to the literature with low per-class accuracies, whereas local binary patterns produce not only very good global accuracy but also good class-specific accuracies with respect to the features used in the literature.
In this paper, we focus on face recognition over image sets, where each set is represented by a linear subspace. Linear Discriminant Analysis (LDA) is adopted for discriminative learning. After investigating the relat...
详细信息
ISBN:
(纸本)9781424439942
In this paper, we focus on face recognition over image sets, where each set is represented by a linear subspace. Linear Discriminant Analysis (LDA) is adopted for discriminative learning. After investigating the relation between regularization on Fisher Criterion and Maximum Margin Criterion, we present a unified framework for regularized LDA. With the framework, the ratio-form maximization of regularized Fisher LDA can be reduced to the difference form optimization with an additional constraint. By incorporating the empirical loss as the regularization term, we introduce a generalized Square Loss based Regularized LDA (SLR-LDA) with suggestion on parameter setting. Our approach achieves superior performance to the state-of-the-art methods on face recognition. Its effectiveness is also evidently verified in general object and object category recognition experiments.
Matching vehicles subject to both large pose transformations and extreme illumination variations remains a technically challenging problem in computervision. In this paper, we develop a new and robust framework towar...
详细信息
ISBN:
(纸本)9781424439942
Matching vehicles subject to both large pose transformations and extreme illumination variations remains a technically challenging problem in computervision. In this paper, we develop a new and robust framework toward matching and recognizing vehicles with both highly varying poses and drastically changing illumination conditions. BY effectively estimating both pose and illumination conditions, we can re-render vehicles in the reference image to generate the relit image with the same pose and illumination conditions as the tat-get image. We compare the relit image and the re-rendered target image to match vehicles in the original reference image and target image. Furthermore, no training is needed in our framework and re-rendered vehicle images in and other viewpoints and illumination conditions can be obtained from just one single input image. Experimental results demonstrate the robustness and efficacy of our framework, with a potential to generalize our current method frame vehicles to handle other types of objects.
Acoustic events produced in meeting environments may contain useful information for perceptually aware interfaces and multimodal behavior analysis. In this paper a system to detect and recognize these events from a mu...
详细信息
ISBN:
(纸本)9781424439942
Acoustic events produced in meeting environments may contain useful information for perceptually aware interfaces and multimodal behavior analysis. In this paper a system to detect and recognize these events from a multimodal perspective is presented combining information from multiple cameras and microphones. First, spectral and temporal features are extracted from a single audio channel and spatial localization is achieved by exploiting cross-correlation among microphone arrays. Second, several video cues obtained from multi-person tracking, motion analysis, face recognition, and object detection provide the visual counterpart of the acoustic events to be detected. A multimodal data fusion at score level is carried out using two approaches: weighted mean average and fuzzy integral. Finally, a multimodal database containing a rich variety of acoustic events has been recorded including manual annotations of the data. A set of metrics allow assessing the performance of the presented algorithms. This dataset is made publicly available for research purposes.
Faces are highly deformable objects which may easily change their appearance over time. Not all face areas are subject to the same variability. Therefore decoupling the information from independent areas of the face i...
详细信息
ISBN:
(纸本)9781424439942
Faces are highly deformable objects which may easily change their appearance over time. Not all face areas are subject to the same variability. Therefore decoupling the information from independent areas of the face is of paramount importance to improve the robustness of any face recognition technique. This paper presents a robust face recognition technique based on the extraction and matching of SIFT features related to independent face areas. Both a global and local (as recognition from parts) matching strategy is proposed The local strategy is based on matching individual salient facial SIFT features as connected to facial landmarks such as the eyes and the mouth. As for the global matching strategy, all SIFT features are combined together to form a single feature. In order to reduce the identification errors, the Dempster-Shafer decision theory is applied to fuse the two matching techniques. The proposed algorithms are evaluated with the ORL and the IITK face databases. The experimental results demonstrate the effectiveness and potential of the proposed face recognition techniques also in the case of partially occluded faces or with missing information.
The paper presents a study on color-to-gray image conversion from a novel point of view: face detection. To the best knowledge of the authors, research in such a specific topic has not been conducted before. Our work ...
详细信息
ISBN:
(纸本)9781424439942
The paper presents a study on color-to-gray image conversion from a novel point of view: face detection. To the best knowledge of the authors, research in such a specific topic has not been conducted before. Our work reveals that the standard NTSC conversion is not optimal for face detection tasks, although it may be the best for use to display pictures on monochrome televisions. It is further found experimentally with two AdaBoost-based face detection systems that the detect rates may vary up to 10% by simply changing the parameters of the RGB-to-Gray conversion. On the other hand, the change has little influence on the false positive rates. Compared to the standard NTSC conversion, the detect rate with the best found parameter setting is 2.85% and 3.58% higher for the two evaluated face detection systems. Promisingly, the work suggests a new solution to the color-to-gray conversion. It could be extremely easy to be incorporated into most existing face detection systems for accuracy improvement without introduction of any extra cost in computational complexity
暂无评论