We present a new model for scene context based on the distribution of textons within images. Our approach provides continuous, consistent scene gist throughout a video sequence and is suitable for applications in whic...
详细信息
ISBN:
(纸本)9781424439942
We present a new model for scene context based on the distribution of textons within images. Our approach provides continuous, consistent scene gist throughout a video sequence and is suitable for applications in which the camera regularly views uninformative parts of the scene. We show that our model outperforms the state-of-the-art for place recognition. We further show how to deduce the camera orientation from our scene gist and finally show how our system can be applied to active object search.
Contextual models play a very important role in the task of object recognition. Over the years, two kinds of contextual models have emerged: models with contextual inference based on the statistical summary of the sce...
详细信息
Contextual models play a very important role in the task of object recognition. Over the years, two kinds of contextual models have emerged: models with contextual inference based on the statistical summary of the scene (we will refer to these as scene based context models, or SBC), and models representing the context in terms of relationships among objects in the image (object based context, or OBC). In designing object recognition systems, it is necessary to understand the theoretical and practical properties of such approaches. This work provides an analysis of these models and evaluates two of their representatives using the LabelMe dataset. We demonstrate a considerable margin of improvement using the OBC style approach.
We consider one of the most basic questions in computervision, that of finding a low-level image representation that could be used to seed diverse, subsequent computations of image understanding. Can we define a rela...
详细信息
We consider one of the most basic questions in computervision, that of finding a low-level image representation that could be used to seed diverse, subsequent computations of image understanding. Can we define a relatively general purpose image representation which would serve as the syntax for diverse needs of image understanding? What makes good image syntax? How do we evaluate it? We pose a series of such questions and evolve a set of answers to them, which in turn help evolve an image representation. For concreteness, we first perform this exercise in the specific context of the following problem.
Summary form only given: In this paper briefly review three formulations of the many-to-many matching problem as applied to model acquisition, model indexing, and object recognition. In the first scenario, I will desc...
详细信息
Summary form only given: In this paper briefly review three formulations of the many-to-many matching problem as applied to model acquisition, model indexing, and object recognition. In the first scenario, I will describe the problem of learning a prototypical shape model from a set of exemplars in which the exemplars may not share a single local feature in common. We formulate the problem as a search through the intractable space of feature combinations, or abstractions, to find the "lowest common abstraction" that is derivable from each input exemplar. This abstraction, in turn, defines a many-to-many feature correspondence among the extracted input features.
Two major obstacles to the use of consumer camcorders in computervision applications are the lack of synchronization hardware, and the use of a "rolling" shutter, which introduces a temporal shear in the vi...
详细信息
Two major obstacles to the use of consumer camcorders in computervision applications are the lack of synchronization hardware, and the use of a "rolling" shutter, which introduces a temporal shear in the video volume. We present two simple approaches for solving both the rolling shutter shear and the synchronization problem at the same time. The first approach is based on strobe illumination, while the second employs a subframe warp along optical flow vectors. In our experiments we have used the proposed methods to effectively remove temporal shear, and synchronize up to 16 consumer-grade camcorders in multiple geometric configurations.
Laughter detection is an important area of interest in the Affective Computing and Human-computer Interaction fields. In this paper, we propose a multi-modal methodology based on the fusion of audio and visual cues to...
详细信息
Laughter detection is an important area of interest in the Affective Computing and Human-computer Interaction fields. In this paper, we propose a multi-modal methodology based on the fusion of audio and visual cues to deal with the laughter recognition problem in face-to-face conversations. The audio features are extracted from the spectogram and the video features are obtained estimating the mouth movement degree and using a smile and laughter classifier. Finally, the multi-modal cues are included in a sequential classifier. Results over videos from the public discussion blog of the New York Times show that both types of features perform better when considered together by the classifier. Moreover, the sequential methodology shows to significantly outperform the results obtained by an Adaboost classifier.
This paper presents a system aimed to serve as the enabling platform for a wearable assistant. The method observes manipulations from a wearable camera and classifies activities from roughly stabilized low resolution ...
详细信息
ISBN:
(纸本)9781424439942
This paper presents a system aimed to serve as the enabling platform for a wearable assistant. The method observes manipulations from a wearable camera and classifies activities from roughly stabilized low resolution images (160×120 pixels) with the help of a 3-level Dynamic Bayesian Network and adapted temporal templates. Our motivation is to explore robust but computationally inexpensive visual methods to perform as much activity inference as possible without resorting to more complex object or hand detectors. The description of the method and results obtained are presented, as well as the motivation for further work in the area of wearable visual sensing.
Automatic mouth detection in a complex background is one of the most challenging and significant tasks in the field of computervision and patternrecognition. A new mouth detection method that uses the relationships ...
详细信息
Summary form only given. Visual categorization, recognition, and detection of objects has been an area of active research in the vision community for decades. Ultimately, the goal is to recognize and detect a large nu...
详细信息
Summary form only given. Visual categorization, recognition, and detection of objects has been an area of active research in the vision community for decades. Ultimately, the goal is to recognize and detect a large number of object classes in images within an acceptable time frame. This problem entangles three highly interconnected issues: the internal object representation which should expand sublinearly with the number of classes, means to learn the representation from a set of images, and an effective inference algorithm that matches the object representation against the representation produced from the scene. In the main part of the talk I will present our framework for learning a hierarchical compositional representation of multiple object classes. Learning is unsupervised, statistical, and is performed bottom-up. The approach takes simple contour fragments and learns their frequent spatial configurations which recursively combine into increasingly more complex and class-specific contour compositions.
This paper proposes a novel tensor based dimensionality reduction algorithm called Multilinear Isometric Embedding (MIE) based on a representative manifold learning algorithm Isomap. Unlike Isomap that unfolds input d...
详细信息
暂无评论