We present a fast graph cut algorithm for planar graphs. It is based on the graph theoretical work [2] and leads to an efficient method that we apply on shape matching and image segmentation. In contrast to currently ...
详细信息
ISBN:
(纸本)9781424439928
We present a fast graph cut algorithm for planar graphs. It is based on the graph theoretical work [2] and leads to an efficient method that we apply on shape matching and image segmentation. In contrast to currently used methods in computervision, the presented approach provides an upper bound for its runtime behavior that is almost linear In particular, we are able to match two different planar shapes of N points in O(N-2 log N) and segment a given image of N pixels in O(N log N). We present two experimental benchmark studies which demonstrate that the presented method is also in practice faster than previously proposed graph cut methods: On planar shape matching and image segmentation we observe a speed-up of an order of magnitude, depending on resolution.
In this paper, we present at? approach for human action recognition with extremities as a compact semantic posture representation. First, we develop a variable star skeleton representation (VSS) in order to accurately...
详细信息
ISBN:
(纸本)9781424439942
In this paper, we present at? approach for human action recognition with extremities as a compact semantic posture representation. First, we develop a variable star skeleton representation (VSS) in order to accurately,find human extremities front contours. Earlier, Fujiyoshi and Lipton [7] proposed an image skeletonization technique with the center of mass as a single star for rapid motion analysis. Yu and Aggarwal [18] used the highest contour point as the second star in their application for fence climbing detection. We implement VSS and earlier algorithms from [7, 18], and compare their performance over a set of 1000 frames from 50 sequences of persons climbing fences to analyze the characteristic of each representation. Our results show that VSS performs the best. Second, we build feature vectors out of detected extremities for Hidden Markov Model (HMM) based human action recognition. On the data set of human climbing fences, we achieved excellent classification accuracy. On the publicly available Blank et al. [3] data set, our approach showed that using only extremities is sufficient to obtain comparable classification accuracy against other state-of-the-art performance. The advantage of our approach lies in the less time complexity with comparable classification accuracy.
Acoustic events produced in meeting environments may contain useful information for perceptually aware interfaces and multimodal behavior analysis. In this paper a system to detect and recognize these events from a mu...
详细信息
ISBN:
(纸本)9781424439942
Acoustic events produced in meeting environments may contain useful information for perceptually aware interfaces and multimodal behavior analysis. In this paper a system to detect and recognize these events from a multimodal perspective is presented combining information from multiple cameras and microphones. First, spectral and temporal features are extracted from a single audio channel and spatial localization is achieved by exploiting cross-correlation among microphone arrays. Second, several video cues obtained from multi-person tracking, motion analysis, face recognition, and object detection provide the visual counterpart of the acoustic events to be detected. A multimodal data fusion at score level is carried out using two approaches: weighted mean average and fuzzy integral. Finally, a multimodal database containing a rich variety of acoustic events has been recorded including manual annotations of the data. A set of metrics allow assessing the performance of the presented algorithms. This dataset is made publicly available for research purposes.
Efficient view registration with respect to a given 3D reconstruction has many applications like inside-out tracking in indoor and outdoor environments, and geo-locating images from large photo collections. We present...
详细信息
ISBN:
(纸本)9781424439928
Efficient view registration with respect to a given 3D reconstruction has many applications like inside-out tracking in indoor and outdoor environments, and geo-locating images from large photo collections. We present a fast location recognition technique based on structure from motion point clouds. Vocabulary tree-based indexing of features directly returns relevant fragments of 3D models instead of documents from the images database. Additionally, we propose a compressed 3D scene representation which improves recognition rates while simultaneously reducing the computation time and the memory consumption. The design of our method is based on algorithms that efficiently utilize modern graphics processing units to deliver real-time performance for view registration. We demonstrate the approach by matching hand-held outdoor videos to known 3D urban models, and by registering images from online photo collections to the corresponding landmarks.
We present an approach to visual object-class recognition and segmentation based on a pipeline that combines multiple, holistic figure-ground hypotheses generated in a bottom-up, object independent process. Decisions ...
详细信息
ISBN:
(纸本)9781424469840
We present an approach to visual object-class recognition and segmentation based on a pipeline that combines multiple, holistic figure-ground hypotheses generated in a bottom-up, object independent process. Decisions are performed based on continuous estimates of the spatial overlap between image segment hypotheses and each putative class. We differ from existing approaches not only in our seemingly unreasonable assumption that good object-level segments can be obtained in a feed-forward fashion, but also in framing recognition as a regression problem. Instead of focusing on a one-vs-all winning margin that can scramble ordering inside the non-maximum ( non-winning) set, learning produces a globally consistent ranking with close ties to segment quality, hence to the extent entire object or part hypotheses spatially overlap with the ground truth. We demonstrate results beyond the current state of the art for image classification, object detection and semantic segmentation, in a number of challenging datasets including Caltech-101, ETHZ-Shape and PASCAL VOC 2009.
We propose a novel approach to designing algorithms for object tracking based on fusing multiple observation models. As the space of possible observation models is too large for exhaustive on-line search, this work ai...
详细信息
ISBN:
(纸本)9781424439928
We propose a novel approach to designing algorithms for object tracking based on fusing multiple observation models. As the space of possible observation models is too large for exhaustive on-line search, this work aims to select models that are suitable for a particular tracking task at hand. During an off-line training stage observation models from various off-the-shelf trackers are evaluated. From this data different methods of fusing the observers on-line are investigated, including parallel and cascaded evaluation. Experiments on test sequences show that this evaluation is useful for automatically designing and assessing algorithms for a particular tracking task. Results are shown for face tracking with a handheld camera and hand tracking for gesture interaction. We show that for these cases combining a small number of observers in a sequential cascade results in efficient algorithms that are both robust and precise.
In this paper we address the problem of localisation and recognition of human activities in unsegmented image sequences. The main contribution of the proposed method is the use of an implicit representation of the spa...
详细信息
ISBN:
(纸本)9781424439942
In this paper we address the problem of localisation and recognition of human activities in unsegmented image sequences. The main contribution of the proposed method is the use of an implicit representation of the spatiotemporal shape of the activity which relies on the spatiotemporal localization of characteristic, sparse, 'visual words' and 'visual verbs'. Evidence for the spatiotemporal localization of the activity are accumulated in a probabilistic spatiotemporal voting scheme. The local nature of our voting framework allows us to recover multiple activities that take place in the same scene, as well as activities in the presence of clutter and occlusions. We construct class-specific codebooks using the descriptors in the training set, where we take the spatial co-occurrences of pairs of codewords into account. The positions of the codeword pairs with respect to the object centre, as well as the frame in the training set in which they occur are subsequently stored in order to create a spatiotemporal model of codeword co-occurrences. During the testing phase, we use Mean Shift Mode estimation in order to spatially segment the subject that performs the activities in every frame, and the Radon transform in order to extract the most probable hypotheses concerning the temporal segmentation of the activities within the continuous stream.
The accurate localization of facial features plays a fundamental role in any face recognition pipeline. Constrained local models (CLM) provide an effective approach to localization by coupling ensembles of local patch...
详细信息
ISBN:
(纸本)9781424439928
The accurate localization of facial features plays a fundamental role in any face recognition pipeline. Constrained local models (CLM) provide an effective approach to localization by coupling ensembles of local patch detectors for non-rigid object alignment. A recent improvement has been made by using generic convex quadratic fitting (CQF), which elegantly addresses the CLM warp update by enforcing convexity of the patch response surfaces. In this paper, CQF is generalized to a Bayesian inference problem, in which it appears as a particular maximum likelihood solution. The Bayesian viewpoint holds many advantages: for example, the task of feature localization can explicitly build on previous face detection stages, and multiple sets of patch responses can be seamlessly incorporated. A second contribution of the paper is an analytic solution to finding convex approximations to patch response surfaces, which removes CQF's reliance on a numeric optimizer Improvements in feature localization performance are illustrated on the Labeled Faces in the Wild and BioID data sets.
Three different statistical models of colour data for use in segmentation or tracking algorithms are proposed. Results of a performance comparison of a tracking algorithm, applied to two separate applications, using e...
详细信息
ISBN:
(纸本)0780342364
Three different statistical models of colour data for use in segmentation or tracking algorithms are proposed. Results of a performance comparison of a tracking algorithm, applied to two separate applications, using each of the three different types of underlying model of the data are presented. From these a comparison of the performance of the statistical colour models themselves is obtained.
This paper presents a non parametric discriminant HMM and applies it to facial expression recognition. In the proposed HMM, we introduce an effective nonparametric output probability estimation method to increase the ...
详细信息
ISBN:
(纸本)9781424439928
This paper presents a non parametric discriminant HMM and applies it to facial expression recognition. In the proposed HMM, we introduce an effective nonparametric output probability estimation method to increase the discrimination ability at both hidden state level and class level. The proposed method uses a nonparametric adaptive kernel to utilize information from all classes and improve the discrimination at class level. The discrimination between hidden states is increased by defining membership coefficients which associate each reference vector with hidden states. The adaption of such coefficients is obtained by the Expectation Maximization (EM) method. Furthermore, we present a general formula for the estimation of output probability, which provides a way to develop new HMMs. Finally, we evaluate the performance of the proposed method on the CMU expression database and compare it with other nonparametric HMMs.
暂无评论