Images of an object undergoing ego- or camera- motion often appear to be scaled, rotated, and deformed versions of each other To detect and match such distorted patterns to a single sample view of the object requires ...
详细信息
ISBN:
(纸本)9781424439928
Images of an object undergoing ego- or camera- motion often appear to be scaled, rotated, and deformed versions of each other To detect and match such distorted patterns to a single sample view of the object requires solving a hard computational problem that has eluded most object matching methods. We propose a linear formulation that simultaneously finds feature point correspondences and global geometrical transformations in a constrained solution space. Further reducing the search space based on the lower convex hull property of the formulation, our method scales well with the number of candidate features. Our results on a variety of images and videos demonstrate that our method is accurate, efficient, and robust over local deformation, occlusion, clutter and large geometrical transformations.
Scene understanding in the context of a smart meeting room involves the extraction of various kinds of cues at different levels of semantic abstraction. Specifically, human activity in a scene is usually monitored usi...
详细信息
ISBN:
(纸本)9781424439942
Scene understanding in the context of a smart meeting room involves the extraction of various kinds of cues at different levels of semantic abstraction. Specifically, human activity in a scene is usually monitored using arrays of audio and visual sensors. Tasks such as person localization and tracking, speaker ID, focus of attention detection, speech recognition and affective state recognition are among them. In this paper we demonstrate a system that extracts such information by synergistically combining the information from the various tasks to support each other We exploit the fact that the output of one kind of human activity analysis task contains valuable information for another such block and by interconnecting them, a robust system results. We demonstrate this in a smart meeting room context equipped with 3 cameras and 16 microphones. The system performs the tasks of person tracking, head pose estimation, beamforming, speaker ID and speech recognition using audio and visual cues. The novelty lies in putting together the tasks such that they can provide relevant information to one another We evaluate the performance of our system and present results for tasks such as keyword spotting and tracking re-identification on real-world meeting scenes collected in our audio-visual testbed.
We present a focus-based method to recover the orientation of a textured planar surface patch from a single image. The method exploits the relationship between the orientation of equifocal (i.e. uniformly-blurred) con...
详细信息
ISBN:
(纸本)9781424439928
We present a focus-based method to recover the orientation of a textured planar surface patch from a single image. The method exploits the relationship between the orientation of equifocal (i.e. uniformly-blurred) contours in the image and the plane's tilt and slant angles. Compared to previous methods that determine planar orientation, we make fewer assumptions about the texture and remove the restriction that images must be acquired through a pinhole aperture. Our method estimates slant and tilt of an image patch in a single image, as compared to depth from defocus methods that require two or more input images. Experiments are performed using a large set of test images.
We present a novel framework for recognizing repetitive sequential events performed by human actors with strong temporal dependencies and potential parallel overlap. Our solution incorporates sub-event (or primitive) ...
详细信息
ISBN:
(纸本)9781424439928
We present a novel framework for recognizing repetitive sequential events performed by human actors with strong temporal dependencies and potential parallel overlap. Our solution incorporates sub-event (or primitive) detectors and a spatiotemporal model for sequential event changes. We develop an effective and efficient method to integrate primitives into a set of sequential events where strong temporal constraints are imposed on the ordering of the primitives. In particular the combination process is approached as an optimization problem. A specialized Viterbi algorithm is designed to learn and infer the target sequential events and handle the event overlap simultaneously. To demonstrate the effectiveness of the proposed framework, we report detailed quantitative analysis on a large set of cashier check-out activities in a retail store.
Maximum likelihood (ML) estimation is widely used in many computervision problems involving the estimation of geometric parameters, from conic fitting to bundle adjustment for structure and motion, his paper presents...
详细信息
ISBN:
(纸本)9781424439928
Maximum likelihood (ML) estimation is widely used in many computervision problems involving the estimation of geometric parameters, from conic fitting to bundle adjustment for structure and motion, his paper presents a detailed discussion on the bias of ML estimates derived for these problems. Statistical theory states that although ML estimates attain maximum accuracy in the limit as the sample size goes to infinity, they can have non-negligible bias with small sample sizes. In the case of computervision problems, the ML optimality holds when regarding variance in observation errors as the sample size. A natural question is how large the bias will be for a given strength of observation errors. o answer this for a general class of problems, we analyze the mechanism of how the bias of ML estimates emerges, and show that the differential geometric properties of geometric constraints used in the problems determines the magnitude of bias. Based on this result, we present a numerical method of computing bias-corrected estimates.
The variations of pose lead to significant performance decline in face recognition systems, which is a bottleneck in face recognition. A key problem is how to measure the similarity between two image vectors of unequa...
详细信息
ISBN:
(纸本)9781424439928
The variations of pose lead to significant performance decline in face recognition systems, which is a bottleneck in face recognition. A key problem is how to measure the similarity between two image vectors of unequal length that viewed from different pose. In this paper, we propose a novel approach for pose robust face recognition, in which the similarity is measured by correlations in a media subspace between different poses on patch level. The media subspace is constructed by Canonical Correlation Analysis, such that the intra-individual correlations are maximized. Based on the media subspace two recognition approaches are developed. In the first, we transform non-frontal face into frontal for recognition. And in the second, we perform recognition in the media subspace with probabilistic modeling. The experimental results on FERET database demonstrate the efficiency of our approach.
We present a fast graph cut algorithm for planar graphs. It is based on the graph theoretical work [2] and leads to an efficient method that we apply on shape matching and image segmentation. In contrast to currently ...
详细信息
ISBN:
(纸本)9781424439928
We present a fast graph cut algorithm for planar graphs. It is based on the graph theoretical work [2] and leads to an efficient method that we apply on shape matching and image segmentation. In contrast to currently used methods in computervision, the presented approach provides an upper bound for its runtime behavior that is almost linear In particular, we are able to match two different planar shapes of N points in O(N-2 log N) and segment a given image of N pixels in O(N log N). We present two experimental benchmark studies which demonstrate that the presented method is also in practice faster than previously proposed graph cut methods: On planar shape matching and image segmentation we observe a speed-up of an order of magnitude, depending on resolution.
In this paper, we present at? approach for human action recognition with extremities as a compact semantic posture representation. First, we develop a variable star skeleton representation (VSS) in order to accurately...
详细信息
ISBN:
(纸本)9781424439942
In this paper, we present at? approach for human action recognition with extremities as a compact semantic posture representation. First, we develop a variable star skeleton representation (VSS) in order to accurately,find human extremities front contours. Earlier, Fujiyoshi and Lipton [7] proposed an image skeletonization technique with the center of mass as a single star for rapid motion analysis. Yu and Aggarwal [18] used the highest contour point as the second star in their application for fence climbing detection. We implement VSS and earlier algorithms from [7, 18], and compare their performance over a set of 1000 frames from 50 sequences of persons climbing fences to analyze the characteristic of each representation. Our results show that VSS performs the best. Second, we build feature vectors out of detected extremities for Hidden Markov Model (HMM) based human action recognition. On the data set of human climbing fences, we achieved excellent classification accuracy. On the publicly available Blank et al. [3] data set, our approach showed that using only extremities is sufficient to obtain comparable classification accuracy against other state-of-the-art performance. The advantage of our approach lies in the less time complexity with comparable classification accuracy.
We present a new, efficient stereo algorithm addressing robust disparity estimation in the presence of occlusions. The algorithm is an adaptive, multi-window scheme using left-right consistency to compute disparity an...
详细信息
ISBN:
(纸本)0780342364
We present a new, efficient stereo algorithm addressing robust disparity estimation in the presence of occlusions. The algorithm is an adaptive, multi-window scheme using left-right consistency to compute disparity and its associated uncertainty. We demonstrate and discuss performances with both synthetic and real stereo pairs, and show how our results improve an those of closely related techniques for both robustness and efficiency.
We present an approach to visual object-class recognition and segmentation based on a pipeline that combines multiple, holistic figure-ground hypotheses generated in a bottom-up, object independent process. Decisions ...
详细信息
ISBN:
(纸本)9781424469840
We present an approach to visual object-class recognition and segmentation based on a pipeline that combines multiple, holistic figure-ground hypotheses generated in a bottom-up, object independent process. Decisions are performed based on continuous estimates of the spatial overlap between image segment hypotheses and each putative class. We differ from existing approaches not only in our seemingly unreasonable assumption that good object-level segments can be obtained in a feed-forward fashion, but also in framing recognition as a regression problem. Instead of focusing on a one-vs-all winning margin that can scramble ordering inside the non-maximum ( non-winning) set, learning produces a globally consistent ranking with close ties to segment quality, hence to the extent entire object or part hypotheses spatially overlap with the ground truth. We demonstrate results beyond the current state of the art for image classification, object detection and semantic segmentation, in a number of challenging datasets including Caltech-101, ETHZ-Shape and PASCAL VOC 2009.
暂无评论