We present a method that is capable of tracking and estimating pose of articulated objects in real-time. this is achieved by using a bottom-up approach to detect instances of the object in each frame, these detections...
详细信息
ISBN:
(纸本)9781424422425
We present a method that is capable of tracking and estimating pose of articulated objects in real-time. this is achieved by using a bottom-up approach to detect instances of the object in each frame, these detections are then linked together using a high-level a priori motion model. Unlike other approaches that rely on appearance, our method is entirely dependent on motion;initial low-level part detection is based on how a region moves as opposed to its appearance. this work is best described as Pictorial Structures using motion. A sparse cloud of points extracted using a standard feature tracker are used as observational data, this data contains noise that is not Gaussian in nature but systematic due to tracking errors. Using a probabilistic framework we are able to overcome both corrupt and missing data whilst still inferring new poses from a generative model. Our approach requires no manual initialisation and we show results for a number of complex scenes and different classes of articulated object, this demonstrates boththe robustness and versatility of the presented technique.
the Middlebury Multi-View Stereo evaluation [18] clearly shows that the quality and speed of most multi-view stereo algorithms depends significantly on the number and selection of input images. In general, not all inp...
详细信息
ISBN:
(纸本)9781424422425
the Middlebury Multi-View Stereo evaluation [18] clearly shows that the quality and speed of most multi-view stereo algorithms depends significantly on the number and selection of input images. In general, not all input images contribute equally to the quality of the output model, since several images may often contain similar and hence overly redundant visual information. this leads to unnecessarily increased processing times. On the other hand, a certain degree of redundancy can help to improve the reconstruction in more "difficult" regions of a model. In this paper we propose an image selection scheme for multi-view stereo which results in improved reconstruction quality compared to uniformly distributed views. Our method is tuned towards the typical requirements of current multi-view stereo algorithms, and is based on the idea of incrementally selecting images so that the overall coverage of a simultaneously generated proxy is guaranteed without adding too much redundant information. Critical regions such as cavities are detected by an estimate of the local photo-consistency and are improved by adding additional views. Our method is highly efficient, since most computations can be out-sourced to the GPU. We evaluate our method with four different methods participating in the Middlebury benchmark and show that in each case reconstructions based on our selected images yield an improved output quality while at the same time reducing the processing time considerably.
Dimensionality reduction has recently been extensively studied for computervision applications. We present a novel multilinear algebra based approach to reduced dimensionality representation of multidimensional data,...
详细信息
Dimensionality reduction has recently been extensively studied for computervision applications. We present a novel multilinear algebra based approach to reduced dimensionality representation of multidimensional data, such as image ensembles, video sequences and volume data. Before reducing the dimensionality we do not convert it into a vector as is done by traditional dimensionality reduction techniques like PCA. Our approach works directly on the multidimensional form of the data (matrix in 2D and tensor in higher dimensions) to yield what we call a Datum-as-Is representation. this helps exploit spatio-temporal redundancies with less information loss than image-as-vector methods. An efficient rank-R tensor approximation algorithm is presented to approximate higher-order tensors. We show that rank-R tensor approximation using Datum-as-Is representation generalizes many existing approaches that use image-as-matrix representation, such as generalized low rank approximation of matrices (GLRAM) (Ye, Y. in Mach. Learn. 61: 167-191, 2005), rank-one decomposition of matrices (RODM) (Shashua, A., Levin, A. in cvpr'01: Proceedings of the 2001 ieeecomputer society conference on computervision and patternrecognition, p. 42, 2001) and rank-one decomposition of tensors (RODT) (Wang, H., Ahuja, N. in ICPR '04: ICPR '04: Proceedings of the 17th international conference on patternrecognition (ICPR'04), vol. 1, pp. 44-47, 2004). Our approach yields the most compact data representation among all known image-as-matrix methods. In addition, we propose another rank-R tensor approximation algorithm based on slice projection of third-order tensors, which needs fewer iterations for convergence for the important special case of 2D image ensembles, e. g., video. We evaluated the performance of our approach vs. other approaches on a number of datasets withthe following two main results. First, for a fixed compression ratio, the proposed algorithm yields the best representation of image
Human beings have the ability to learn to recognize a new visual category based on only one or few training examples. Part of this ability might come from the use of knowledge from previous visual experiences. We show...
详细信息
Human beings have the ability to learn to recognize a new visual category based on only one or few training examples. Part of this ability might come from the use of knowledge from previous visual experiences. We show that such knowledge can be expressed as a set of "universal" visual features, which are learned from randomly collected natural scene images. Using these visual features, we have obtained state-of-the-art performance on several classification tasks using a single-layer classifier.
We propose a technique for cheap and efficient acquisition of mesostructure normal maps from specularities, which only requires a simple LCD monitor and a digital camera. Coded illumination enables us to capture subtl...
详细信息
We propose a technique for cheap and efficient acquisition of mesostructure normal maps from specularities, which only requires a simple LCD monitor and a digital camera. Coded illumination enables us to capture subtle surface details with only a handful of images. In addition, our method can deal with heterogeneous surfaces, and high albedo materials. We are able to recover highly detailed mesostructures, which was previously only possible with an expensive hardware setup.
this paper describes a method for finding wide-baseline correspondences between images at locations along gradient edges. We find edges in scale space using established methods and develop invariant descriptors for th...
详细信息
this paper describes a method for finding wide-baseline correspondences between images at locations along gradient edges. We find edges in scale space using established methods and develop invariant descriptors for these edges based on orientation and scale histograms. Because edges are often found on occluding boundaries, we calculate and store two descriptors per edge, one on each side, for robustness to occlusions. We demonstrate the effectiveness of edge matching in the applications of wide-baseline correspondence, structure from motion from line segments, and object category recognition on the Caltech 101 dataset.
We propose a joint representation and classification framework that achieves the dual goal of finding the most discriminative sparse overcomplete encoding and optimal classifier parameters. Formulating an optimization...
详细信息
We propose a joint representation and classification framework that achieves the dual goal of finding the most discriminative sparse overcomplete encoding and optimal classifier parameters. Formulating an optimization problem that combines the objective function of the classification withthe representation error of both labeled and unlabeled data, constrained by sparsity, we propose an algorithm that alternates between solving for subsets of parameters, whilst preserving the sparsity. the method is then evaluated over two important classification problems in computervision: object categorization of natural images using the Caltech 101 database and face recognition using the Extended Yale B face database. the results show that the proposed method is competitive against other recently proposed sparse overcomplete counterparts and considerably outperforms many recently proposed face recognition techniques when the number training samples is small.
this paper proposes a novel method called micro-deformation analysis to analyze and describe local image structures. this method is a general analytic tool and can be applied to any high-dimensional scalar or vector f...
详细信息
this paper proposes a novel method called micro-deformation analysis to analyze and describe local image structures. this method is a general analytic tool and can be applied to any high-dimensional scalar or vector functions. We derive the tensor matrix from this method as the descriptor to represent the information within local image patches. Our experimental results suggest that we can design low-dimensional local tensor descriptors with performance comparable to the popular SIFT descriptor, which is the state-of-the-art feature descriptor used for object recognition and categorization.
computervision and image recognition research have a great interest in dimensionality reduction techniques. Generally these techniques are independent of the classifier being used and the learning of the classifier i...
详细信息
computervision and image recognition research have a great interest in dimensionality reduction techniques. Generally these techniques are independent of the classifier being used and the learning of the classifier is carried out after the dimensionality reduction is performed, possibly discarding valuable information. In this paper we propose an iterative algorithm that simultaneously learns a linear projection base and a reduced set of prototypes optimized for the Nearest-Neighbor classifier. the algorithm is derived by minimizing a suitable estimation of the classification error probability. the proposed approach is assessed through a series of experiments showing a good behavior and a real potential for practical applications.
this paper presents a novel method for location recognition, which exploits an epitomic representation to achieve both high efficiency and good generalization. A generative model based on epitomic image analysis captu...
详细信息
this paper presents a novel method for location recognition, which exploits an epitomic representation to achieve both high efficiency and good generalization. A generative model based on epitomic image analysis captures the appearance and geometric structure of an environment while allowing for variations due to motion, occlusions and non-Lambertian effects. the ability to model translation and scale invariance together withthe fusion of diverse visual features yield enhanced generalization with economical training. Experiments on both existing and new labelled image databases result in recognition accuracy superior to state of the art with real-time computational performance.
暂无评论