Contextual information can greatly improve boththe speed and accuracy of object recognition. Context is most often viewed as a static concept, learned from large image databases. We build upon this concept by explori...
详细信息
ISBN:
(纸本)9781479943098
Contextual information can greatly improve boththe speed and accuracy of object recognition. Context is most often viewed as a static concept, learned from large image databases. We build upon this concept by exploring cognitive context, demonstrating how rich dynamic context provided by computational cognitive models can improve object recognition. We demonstrate the use cognitive context to improve recognition using a small database of objects.
We present a novel solution to compute the relative pose of a generalized camera. Existing solutions are either not general, have too high computational complexity, or require too many correspondences, which impedes a...
详细信息
ISBN:
(纸本)9781479951178
We present a novel solution to compute the relative pose of a generalized camera. Existing solutions are either not general, have too high computational complexity, or require too many correspondences, which impedes an efficient or accurate usage within Ransac schemes. We factorize the problem as a low-dimensional, iterative optimization over relative rotation only, directly derived from well-known epipolar constraints. Common generalized cameras often consist of camera clusters, and give rise to omnidirectional landmark observations. We prove that our iterative scheme performs well in such practically relevant situations, eventually resulting in computational efficiency similar to linear solvers, and accuracy close to bundle adjustment, while using less correspondences. Experiments on both virtual and real multi-camera systems prove superior overall performance for robust, real-time multi-camera motion-estimation.
In this paper, we propose a robust method for visual tracking relying on mean shift, sparse coding and spatial pyramids. Firstly, we extend the original mean shift approach to handle orientation space and scale space ...
详细信息
ISBN:
(纸本)9781479951178
In this paper, we propose a robust method for visual tracking relying on mean shift, sparse coding and spatial pyramids. Firstly, we extend the original mean shift approach to handle orientation space and scale space and name this new method as mean transform. the mean transform method estimates the motion, including the location, orientation and scale, of the interested object window simultaneously and effectively. Secondly, a pixel-wise dense patch sampling technique and a region-wise trivial template designing scheme are introduced which enable our approach to run very accurately and efficiently. In addition, instead of using either holistic representation or local representation only, we apply spatial pyramids by combining these two representations into our approach to deal with partial occlusion problems robustly. Observed from the experimental results, our approach outperforms state-of-theart methods in many benchmark sequences.
In this paper, we propose an efficient method to reconstruct surface-from-gradients (SfG). Our method is formulated under the framework of discrete geometry processing. Unlike the existing SfG approaches, we transfer ...
详细信息
ISBN:
(纸本)9781479951178
In this paper, we propose an efficient method to reconstruct surface-from-gradients (SfG). Our method is formulated under the framework of discrete geometry processing. Unlike the existing SfG approaches, we transfer the continuous reconstruction problem into a discrete space and efficiently solve the problem via a sequence of least-square optimization steps. Our discrete formulation brings three advantages: 1) the reconstruction preserves sharp-features, 2) sparse/incomplete set of gradients can be well handled, and 3) domains of computation can have irregular boundaries. Our formulation is direct and easy to implement, and the comparisons with state-of-the-arts show the effectiveness of our method.
We propose a data-driven approach to facial landmark localization that models the correlations between each landmark and its surrounding appearance features. At runtime, each feature casts a weighted vote to predict l...
详细信息
ISBN:
(纸本)9781479951178
We propose a data-driven approach to facial landmark localization that models the correlations between each landmark and its surrounding appearance features. At runtime, each feature casts a weighted vote to predict landmark locations, where the weight is precomputed to take into account the feature's discriminative power. the feature voting-based landmark detection is more robust than previous local appearance-based detectors;we combine it with non-parametric shape regularization to build a novel facial landmark localization pipeline that is robust to scale, in-plane rotation, occlusion, expression, and most importantly, extreme head pose. We achieve state-of-the-art performance on two especially challenging in-the-wild datasets populated by faces with extreme head pose and expression.
Accurate ground truth pose is essential to the training of most existing head pose estimation algorithms. However, in many cases, the "ground truth" pose is obtained in rather subjective ways, such as asking...
详细信息
ISBN:
(纸本)9781479951178
Accurate ground truth pose is essential to the training of most existing head pose estimation algorithms. However, in many cases, the "ground truth" pose is obtained in rather subjective ways, such as asking the human subjects to stare at different markers on the wall. In such case, it is better to use soft labels rather than explicit hard labels. therefore, this paper proposes to associate a multivariate label distribution (MLD) to each image. An MLD covers a neighborhood around the original pose. Labeling the images with MLD can not only alleviate the problem of inaccurate pose labels, but also boost the training examples associated to each pose without actually increasing the total amount of training examples. Two algorithms are proposed to learn from the MLD by minimizing the weighted Jeffrey's divergence between the predicted MLD and the ground truth MLD. Experimental results show that the MLD-based methods perform significantly better than the compared state-of-the-art head pose estimation algorithms.
We have discovered that 3D reconstruction can be achieved from a single still photographic capture due to accidental motions of the photographer, even while attempting to hold the camera still. Although these motions ...
详细信息
ISBN:
(纸本)9781479951178
We have discovered that 3D reconstruction can be achieved from a single still photographic capture due to accidental motions of the photographer, even while attempting to hold the camera still. Although these motions result in little baseline and therefore high depth uncertainty, in theory, we can combine many such measurements over the duration of the capture process (a few seconds) to achieve usable depth estimates. We present a novel 3D reconstruction system tailored for this problem that produces depth maps from short video sequences from standard cameras without the need for multi-lens optics, active sensors, or intentional motions by the photographer. this result leads to the possibility that depth maps of sufficient quality for RGB-D photography applications like perspective change, simulated aperture, and object segmentation, can come "for free" for a significant fraction of still photographs under reasonable conditions.
Dynamic Bayesian networks such as Hidden Markov Models (HMMs) are successfully used as probabilistic models for human motion. the use of hidden variables makes them expressive models, but inference is only approximate...
详细信息
ISBN:
(纸本)9781479951178
Dynamic Bayesian networks such as Hidden Markov Models (HMMs) are successfully used as probabilistic models for human motion. the use of hidden variables makes them expressive models, but inference is only approximate and requires procedures such as particle filters or Markov chain Monte Carlo methods. In this work we propose to instead use simple Markov models that only model observed quantities. We retain a highly expressive dynamic model by using interactions that are nonlinear and non-parametric. A presentation of our approach in terms of latent variables shows logarithmic growth for the computation of exact log-likelihoods in the number of latent states. We validate our model on human motion capture data and demonstrate state-of-the-art performance on action recognition and motion completion tasks.
Most modern object trackers combine a motion prior with sliding-window detection, using binary classifiers that predict the presence of the target object based on histogram features. Although the accuracy of such trac...
ISBN:
(纸本)9781479951178
Most modern object trackers combine a motion prior with sliding-window detection, using binary classifiers that predict the presence of the target object based on histogram features. Although the accuracy of such trackers is generally very good, they are often impractical because of their high computational requirements. To resolve this problem, the paper presents a new approach that limits the computational costs of trackers by ignoring features in image regions that - after inspecting a few features - are unlikely to contain the target object. To this end, we derive an upper bound on the probability that a location is most likely to contain the target object, and we ignore (features in) locations for which this upper bound is small. We demonstrate the effectiveness of our new approach in experiments with model-free and model-based trackers that use linear models in combination with HOG features. the results of our experiments demonstrate that our approach allows us to reduce the average number of inspected features by up to 90% without affecting the accuracy of the tracker.
this work analyzes the problem of homography estimation for robust target matching in the context of real-time mobile vision. We present a device-friendly implementation of the Gaussian Elimination algorithm and show ...
详细信息
ISBN:
(纸本)9781479943098
this work analyzes the problem of homography estimation for robust target matching in the context of real-time mobile vision. We present a device-friendly implementation of the Gaussian Elimination algorithm and show that our optimized approach can significantly improve the homography estimation step in a hypothesize-and-verify scheme. Experiments are performed on image sequences in which both speed and accuracy are evaluated and compared with conventional homography estimation schemes.
暂无评论