Intelligent rooms equipped with video cameras can exhibit compelling behaviors, many of which depend on object recognition. Unfortunately, object recognition algorithms are rarely written with a normal consumer in min...
详细信息
ISBN:
(纸本)0769506623
Intelligent rooms equipped with video cameras can exhibit compelling behaviors, many of which depend on object recognition. Unfortunately, object recognition algorithms are rarely written with a normal consumer in mind, leading to programs that would be impractical to use for a typical person. These impracticalities include speed of execution, elaborate training rituals, and setting adjustable parameters. We present an algorithm that can be trained with only a few images of the object, that requires only two parameters to be set, and that runs at 0.7 Hz on a normal PC with a normal color camera. The algorithm represents an object's features as small, quantized edge templates, and it represents the object's geometry with "Hough kernels". The Hough kernels implement a variant of the generalized Hough transform using simple, 2D image correlation. The algorithm also uses color information to eliminate parts of the image from consideration. We give our results in terms of ROC curves for recognizing a computer keyboard with partial occlusion and background clutter Even with two hands occluding the keyboard, the detection rate is 0.885 with a false alarm rate of 0.03.
This paper describes a probabilistic decomposition of human dynamics at multiple abstractions, and shows how to propagate hypotheses across space, time, and abstraction levels. recognition in this framework is the suc...
详细信息
ISBN:
(纸本)0780342364
This paper describes a probabilistic decomposition of human dynamics at multiple abstractions, and shows how to propagate hypotheses across space, time, and abstraction levels. recognition in this framework is the succession of very general low level grouping mechanisms to increased specific and learned model based grouping techniques at higher levels. Hard decision thresholds are delayed and resolved by higher level statistical models and temporal context. Low-level primitives are areas of coherent motion found by EM clustering, mid-level categories are simple movements represented by dynamical systems, and high-level complex gestures are represented by Hidden Markov Models as successive phases of simple movements. We show how such a representation can be learned from training data, and apply It to the example of human gait recognition.
Current computervision systems whose basic methodology is open-loop or filter type typically use image segmentation followed by object recognition algorithms. These systems are not robust for most real-world applicat...
详细信息
ISBN:
(纸本)0818672587
Current computervision systems whose basic methodology is open-loop or filter type typically use image segmentation followed by object recognition algorithms. These systems are not robust for most real-world applications. In contrast, the system presented here achieves robust performance by using reinforcement learning to induce a mapping from input images to corresponding segmentation parameters. This is accomplished by using the confidence level of model matching as a reinforcement signal for a team of learning automata to search for segmentation parameters during training. The use of the recognition algorithm as part of the evaluation function for image segmentation gives rise to significant improvement of the system performance by automatic generation of recognition strategies. The system is verified through experiments on sequences of color images with varying external conditions.
In this contribution we present an algorithm for tracking non-rigid, moving objects in a sequence of colored images, which were recorded by a non-stationary camera. The application background is vision-based driving a...
详细信息
ISBN:
(纸本)0780342364
In this contribution we present an algorithm for tracking non-rigid, moving objects in a sequence of colored images, which were recorded by a non-stationary camera. The application background is vision-based driving assistance in the inner city In an initial step, object parts are determined by a divisive clustering algorithm, which is applied to all pixels in the first image of the sequence. The feature space is defined by the color and position of a pixel. For each new image the clusters of the previous image are adapted iteratively by a parallel k-means clustering algorithm. Instead of tracking single points, edges, or areas over a sequence of images, only the centroids of the clusters are tracked. The proposed method remarkably simplifies the correspondence problem and also ensures a robust tracking behavior.
It is widely accepted that textureless surfaces cannot be recovered using passive sensing techniques. The problem is approached by viewing image formation as a Sully three-dimensional mapping. It is shown that the len...
详细信息
ISBN:
(纸本)0780342364
It is widely accepted that textureless surfaces cannot be recovered using passive sensing techniques. The problem is approached by viewing image formation as a Sully three-dimensional mapping. It is shown that the lens encodes structural information of the scene within a compact three-dimensional space behind it. After analyzing the information content of this space and by using its properties we derive necessary and sufficient conditions for the recovery of textureless scenes. Based on these conditions, a simple procedure for recovering textureless scenes is described. We experimentally demonstrate the recovery of three textureless surfaces, namely, a line, a plane, and a paraboloid. Since textureless surfaces represent the worst case recovery scenario, all the results and the recovery procedure are naturally applicable to scenes with texture.
Many vision tasks require a multi-class classifier to discriminate multiple categories, on the order of hundreds or thousands. In this paper, we propose sparse output coding, a principled way for large-scale multi-cla...
详细信息
ISBN:
(纸本)9780769549897
Many vision tasks require a multi-class classifier to discriminate multiple categories, on the order of hundreds or thousands. In this paper, we propose sparse output coding, a principled way for large-scale multi-class classification, by turning high-cardinality multi-class categorization into a bit-by-bit decoding problem. Specifically, sparse output coding is composed of two steps: efficient coding matrix learning with scalability to thousands of classes, and probabilistic decoding. Empirical results on object recognition and scene classification demonstrate the effectiveness of our proposed approach.
We present algorithms for coupling and training hidden Markov models CHMMsl to model interacting processes, and demonstrate their superiority to conventional HMMs in a vision task classifying two-handed actions. HMMs ...
详细信息
ISBN:
(纸本)0780342364
We present algorithms for coupling and training hidden Markov models CHMMsl to model interacting processes, and demonstrate their superiority to conventional HMMs in a vision task classifying two-handed actions. HMMs are perhaps the most successful framework in perceptual computing for modeling and classifying dynamic behaviors, popular because they offer dynamic time warping, a training algorithm, and a clear Bayesian semantics. However;the Markovian framework makes strong restrictive assumptions about the system generating the signal-that it is a single process having a smalt number of states and an extremely limited stare memory The single-process model is often inappropriate for vision (and speech) applications, resulting in low ceilings on model performance. Coupled HMMs provide an efficient way to resolve many of these problems, and offer superior training speeds, model likelihoods, and robustness to initial conditions.
We present a novel image operator that seeks to find the value of stroke width for each image pixel, and demonstrate its use on the task of text detection in natural images. The suggested operator is local and data de...
详细信息
ISBN:
(纸本)9781424469840
We present a novel image operator that seeks to find the value of stroke width for each image pixel, and demonstrate its use on the task of text detection in natural images. The suggested operator is local and data dependent, which makes it fast and robust enough to eliminate the need for multi-scale computation or scanning windows. Extensive testing shows that the suggested scheme outperforms the latest published algorithms. Its simplicity allows the algorithm to detect texts in many fonts and languages.
We present Deep Global Registration, a differentiable framework for pairwise registration of real-world 3D scans. Deep global registration is based on three modules: a 6-dimensional convolutional network for correspon...
详细信息
ISBN:
(纸本)9781728171685
We present Deep Global Registration, a differentiable framework for pairwise registration of real-world 3D scans. Deep global registration is based on three modules: a 6-dimensional convolutional network for correspondence confidence prediction, a differentiable Weighted Procrustes algorithm for closed-form pose estimation, and a robust gradient-based SE(3) optimizer for pose refinement. Experiments demonstrate that our approach outperforms state-of-the-art methods, both learning-based and classical, on real-world data.
暂无评论