Laughter detection is an important area of interest in the Affective Computing and Human-computer Interaction fields. In this paper we propose a multi-modal methodology, based on the fusion of audio and visual cues to...
详细信息
ISBN:
(纸本)9781424439942
Laughter detection is an important area of interest in the Affective Computing and Human-computer Interaction fields. In this paper we propose a multi-modal methodology, based on the fusion of audio and visual cues to deal with the laughter recognition problem in face-to-face conversations. The audio features are extracted from the spectogram and the video features are obtained estimating the mouth movement degree and using a smile and laughter classifier Finally, the multi-modal cues are included in a sequential classifier Results over videos from the public discussion blog of the New York Times show that both types of features perform better when considered together by the classifier Moreover the sequential methodology shows to significantly, outperform the results obtained by an Adaboost classifier
Contextual models play a very important role in the task of object recognition. Over the years, two kinds of contextual models have emerged: models with contextual inference based on the statistical summary of the sce...
详细信息
ISBN:
(纸本)9781424439942
Contextual models play a very important role in the task of object recognition. Over the years, two kinds of contextual models have emerged: models with contextual inference based on the statistical summary of the scene (we will refer to these as Scene Based Context models, or SBC), and models representing the context in terms of relationships among objects in the image (Object Based Context, or OBC). In designing object recognition systems, it is necessary to understand the theoretical and practical properties of such approaches. This work provides an analysis of these models and evaluates two of their representatives using the LabelMe dataset. We demonstrate a considerable margin of improvement using the OBC style approach.
We investigate the problem of recognizing words from video, fingerspelled using the British Sign Language (BSL) fingerspelling alphabet. This is a challenging task since the BSL alphabet involves both hands occluding ...
详细信息
ISBN:
(纸本)9781424439942
We investigate the problem of recognizing words from video, fingerspelled using the British Sign Language (BSL) fingerspelling alphabet. This is a challenging task since the BSL alphabet involves both hands occluding each other and contains signs which are ambiguous from the observer's viewpoint. The main contributions of our work include: (i) recognition based on hand shape alone, not requiring motion cues;(ii) robust visual features for hand shape recognition;(iii) scalability to large lexicon recognition with no re-training. We report results on a dataset of 1,000 low quality web-cam videos of 100 words. The proposed method achieves a word recognition accuracy of 98.9%.
Variations in pose, expression, illumination, aging and disguise are considered as major challenges in face recognition and several techniques have been proposed to address these challenges. Plastic surgery, on the ot...
详细信息
ISBN:
(纸本)9781424439942
Variations in pose, expression, illumination, aging and disguise are considered as major challenges in face recognition and several techniques have been proposed to address these challenges. Plastic surgery, on the other hand, is considered as an arduous research issue;however, it has not yet been studied either theoretically, or experimentally This paper focuses on analyzing the effect of plastic surgery in face recognition algorithms. The preliminary study provides an experimental and analytical comparison of face recognition algorithms on a plastic surgery, database of 506 individuals. The experimental results indicate that existing face recognition algorithms perform poorly when matching pre and post surgery face images. The results also suggest that it is imperative for future face recognition systems to be able to address this important issue and hence there is a need for more research in this important area.
This paper addresses large-displacement-diffeomorphic mapping registration from an optimal control perspective. This viewpoint leads to two complementary formulations. One approach requires the explicit computation of...
详细信息
ISBN:
(纸本)9781424439942
This paper addresses large-displacement-diffeomorphic mapping registration from an optimal control perspective. This viewpoint leads to two complementary formulations. One approach requires the explicit computation of coordinate maps, whereas the other is formulated strictly in the image domain (thus making it also applicable to manifolds which require multiple coordinate charts). We discuss their intrinsic relation as well as the advantages and disadvantages of the two approaches. Further we propose a novel formulation for unbiased image registration, which naturally extends to the case of time-series of images. We discuss numerical implementation details and carefully evaluate the properties of the alternative algorithms.
Detecting suspicious events from video surveillance cameras has been an important task recently. Many trajectory based descriptors were developed, such as to detect people running or moving in opposite direction. Howe...
详细信息
ISBN:
(纸本)9781424439942
Detecting suspicious events from video surveillance cameras has been an important task recently. Many trajectory based descriptors were developed, such as to detect people running or moving in opposite direction. However, these trajectory based descriptors are not working well in the crowd environments like airports, rail stations, because those descriptors assume perfect motion/object segmentation. In this paper, we present an event detection method using dynamic texture descriptor. The dynamic texture descriptor is an extension of the local binary patterns. The image sequences are divided into regions. A flow is formed based on the similarity of the dynamic texture descriptors on the regions. We used real dataset for experiments. The results are promising.
We describe a framework for face recognition at a distance based on sparse-stereo reconstruction. We develop a 3D acquisition system that consists of two CCD stereo cameras mounted on pan-tilt units with adjustable ba...
详细信息
ISBN:
(纸本)9781424439942
We describe a framework for face recognition at a distance based on sparse-stereo reconstruction. We develop a 3D acquisition system that consists of two CCD stereo cameras mounted on pan-tilt units with adjustable baseline. We first detect the facial region and extract its landmark points, which are used to initialize an AAM mesh fitting algorithm. The fitted mesh vertices provide point correspondences between the left and right images of a stereo pair;stereo-based reconstruction is then used to infer the 3D information of the mesh vertices. We perform experiments regarding the use of different features extracted from these vertices for face recognition. The cumulative rank curves (CMC), which are generated using the proposed framework, confirms the feasibility of the proposed work for long distance recognition of human faces with respect to the state-of-the-art.
The four papers in this special section are extended versions of award-winning papers from the 2007 ieeeconference on computervision and patternrecognition (cvpr 2007).
The four papers in this special section are extended versions of award-winning papers from the 2007 ieeeconference on computervision and patternrecognition (cvpr 2007).
This paper presents a unified framework for object detection, segmentation, and classification using regions. Region features are appealing in this context because: (1) they encode shape and scale information of objec...
详细信息
ISBN:
(纸本)9781424439928
This paper presents a unified framework for object detection, segmentation, and classification using regions. Region features are appealing in this context because: (1) they encode shape and scale information of objects naturally;(2) they are only mildly affected by background clutter Regions have not been popular as features due to their sensitivity to segmentation errors. In this paper, we start by producing a robust bag of overlaid regions for each image using Arbelaez et al., cvpr 2009. Each region is represented by a rich set of image cues (shape, color and texture). We then learn region weights using a max-margin framework. In detection and segmentation, we apply a generalized Hough voting scheme to generate hypotheses of object locations, scales and support, followed by a verification classifier and a constrained segmenter on each hypothesis. The proposed approach significantly outperforms the state of the art on the ETHZ shape database (87.1% average detection rate compared to Ferrari et al. 's 67.2%), and achieves competitive performance on the Caltech 101 database.
We demonstrate that is it possible to automatically find representative example images of a specified object category These canonical examples are perhaps the kind of images that one would show a child to teach them w...
详细信息
ISBN:
(纸本)9781424439942
We demonstrate that is it possible to automatically find representative example images of a specified object category These canonical examples are perhaps the kind of images that one would show a child to teach them what, for example a horse is - images with a large object clearly separated from the background. Given a large collection of images returned by a web search for an object category, our approach proceeds without an), user supplied training data for the category. First images are ranked according to a category independent composition model that predicts whether the), contain a large clearly depicted object, and outputs an estimated location of that object. Then local features calculated on the proposed object regions are used to eliminate images not distinctive to the category, and to cluster images by similarity of object appearance. We present results and a user evaluation on a variety of object categories, demonstrating the effectiveness of the approach.
暂无评论