This paper presents a new visual motion cue, we call the Visual Threat Cue (VTC) that provides some measure for a relative change in range as well as clearance between a 3D surface and a fixating observer in motion. T...
详细信息
ISBN:
(纸本)0818672587
This paper presents a new visual motion cue, we call the Visual Threat Cue (VTC) that provides some measure for a relative change in range as well as clearance between a 3D surface and a fixating observer in motion. The VTC corresponds to visual fields surrounding a moving observer. The fields are time-based imaginary 3-D surfaces that move with the observer. They are analogous to equi-potential fields of an electric dipole. A practical method to extract the VTC is presented. The approach is independent of the 3D surface texture and needs no optical flow information, 3D reconstruction, segmentation, feature tracking or pre-processing. This algorithm to extract the VTC was applied to several indoor as well as outdoor real images of textures, where we observed a similar behavior for most of the textures employed.
Foveated vision and two-mode tracking, as inspired by the human oculomotor system, are often used in active vision system. The purpose of this paper is to provide answers to the following basic questions which arise f...
详细信息
ISBN:
(纸本)0818672587
Foveated vision and two-mode tracking, as inspired by the human oculomotor system, are often used in active vision system. The purpose of this paper is to provide answers to the following basic questions which arise from implementations. First, is it beneficial to have foveated vision and what is the optimal size of the foveal window? Second, is there a need for two control mechanisms (smooth pursuit and saccade) for improved performance and how can one efficiently switch between them? In order to do so, a setup is proposed in which these strategies can be evaluated in a systematic manner. It is shown that the fovea appears as a compromise between the tightness of the tracking specifications and computational constraints. Introducing a model for the later and postulating some a priori knowledge of the target behavior, it is possible to compute the size of the fovea in an optimal way. As a by-product, 'smooth-pursuit' can be defined in a natural way, and the use of a two-mode tracking scheme is justified. The second mode, i.e. 'saccadic control', aims at re-centering the target on the fovea so that the smooth pursuit controller can continue to operate. It is shown that a control strategy can indeed be defined so that this objective can be met under appropriate operating conditions.
We evaluated six algorithms for computing egomotion from image velocities. We established benchmarks for quantifying bias and sensitivity to noise, and for quantifying the convergence properties of those algorithms th...
详细信息
ISBN:
(纸本)0818672587
We evaluated six algorithms for computing egomotion from image velocities. We established benchmarks for quantifying bias and sensitivity to noise, and for quantifying the convergence properties of those algorithms that require numerical search. Our simulation results reveal some interesting and surprising results. First, it is often written in the literature that the egomotion problem is difficult because translation (e.g., along the X-axis) and rotation (e.g., about the Y-axis) produce similar image velocities. We found, to the contrary, that the bias and sensitivity of our six algorithms are totally invariant with respect to the axis of rotation. Second, it is also believed by some that fixating helps to make the egomotion problem easier. We found, to the contrary, that fixating does not help when the noise is independent of the image velocities. Fixation does help if the noise is proportional to speed, but this is only for the trivial reason that the speeds are slower under fixation. Third, it is widely believed that increasing the field of view will yield better performance. We found, to the contrary, that this is not necessarily true.
We present ordinal measures for establishing image correspondence. Linear correspondence measures like correlation and the sum of squared differences are known to be fragile. Ordinal measures, which are based on relat...
详细信息
ISBN:
(纸本)0818672587
We present ordinal measures for establishing image correspondence. Linear correspondence measures like correlation and the sum of squared differences are known to be fragile. Ordinal measures, which are based on relative ordering of intensity values in windows, have demonstrable robustness to depth discontinuities, occlusion and noise. The relative ordering of intensity values in each window is represented by a rank permutation which is obtained by sorting the corresponding intensity data. By using a novel distance metric between the rank permutations, we arrive at ordinal correlation coefficients. These coefficients are independent of absolute intensity scale, i.e they are normalized measures. Further, since rank permutations are invariant to monotone transformations of the intensity values, the coefficients are unaffected by nonlinear effects like gamma variation between images. We have developed a simple algorithm for their efficient implementation. Experiments suggest the superiority of ordinal measures over existing techniques under non-ideal conditions. Though we present ordinal measures in the context of stereo, they serve as a general tool for image matching that is applicable to other vision problems such as motion estimation and image registration.
We present a new method for the 3D model-based tracking of human body parts. To mitigate the difficulties arising due to occlusion among body parts, we employ multiple calibrated cameras in a mutually orthogonal confi...
详细信息
ISBN:
(纸本)0818672587
We present a new method for the 3D model-based tracking of human body parts. To mitigate the difficulties arising due to occlusion among body parts, we employ multiple calibrated cameras in a mutually orthogonal configuration. In addition, we develop criteria for a time varying active selection of a set of cameras to track the motion of a particular human part. In particular, at every frame, each camera tracks a number of parts depending on the visibility of these parts and the observability of their predicted motion from the specific camera. To relate points on the occluding contours of the parts to points on their models we apply concepts from projective geometry. Then, within the physics-based framework we compute the generalized forces applied from the parts' occluding contours to model points of the body parts. These forces update the translational and rotational degrees of freedom of the model, such as to minimize the discrepancy between the sensory data and the estimated model state. We present initial tracking results from a series of experiments involving the recovery of complex 3D motions in the presence of significant occlusion.
We consider the problem of feature-based face recognition in the setting where only a single example of each face is available for training. The mixture-distance technique we introduce achieves a recognition rate of 9...
详细信息
ISBN:
(纸本)0818672587
We consider the problem of feature-based face recognition in the setting where only a single example of each face is available for training. The mixture-distance technique we introduce achieves a recognition rate of 95% on a database of 685 people in which each face is represented by 30 measured distances. This is currently the best recorded recognition rate for a feature-based system applied to a database of this size. By comparison, nearest neighbor search using Euclidean distance yields 84%. In our work a novel distance function is constructed based on local second order statistics as estimated by modeling the training data as a mixture of normal densities. We report on the results from mixtures of several sizes. We demonstrate that a flat mixture of mixtures performs as well as the best model and therefore represents an effective solution to the model selection problem. A mixture perspective is also taken for individual Gaussians to choose between first order (variance) and second order (covariance) models. Here an approximation to flat combination is proposed and seen to perform well in practice. Our results demonstrate that even in the absence of multiple training examples for each class, it is sometimes possible to infer from a statistical model of training data, a significantly improved distance function for use in patternrecognition.
Faces represent complex, multidimensional, meaningful visual stimuli and developing a computational model for face recognition is difficult. We present a hybrid neural network solution which compares favorably with ot...
详细信息
ISBN:
(纸本)0818672587
Faces represent complex, multidimensional, meaningful visual stimuli and developing a computational model for face recognition is difficult. We present a hybrid neural network solution which compares favorably with other methods. The system combines local image sampling, a self-organizing map neural network, and a convolutional neural network. The self-organizing map provides a quantization of the image samples into a topological space where inputs that are nearby in the original space are also nearby in the output space, thereby providing dimensionality reduction and invariance to minor changes in the image sample, and the convolutional neural network provides for partial invariance to translation, rotation, scale, and deformation. The method is capable of rapid classification, requires only fast, approximate normalization and preprocessing, and consistently exhibits better classification performance than the eigenfaces approach on the database considered as the number of images per person in the training database is varied from 1 to 5. With 5 images per person the proposed method and eigenfaces result in 3.8% and 10.5% error respectively. The recognizer provides a measure of confidence in its output and classification error approaches zero when rejecting as few as 10% of the examples. We use a database of 400 images of 40 individuals which contains quite a high degree of variability in expression, pose, and facial details.
We present a vision system for the 3-D model-based tracking of unconstrained human movement. Using image sequences acquired simultaneously from multiple views, we recover the 3-D body pose at each time instant without...
详细信息
ISBN:
(纸本)0818672587
We present a vision system for the 3-D model-based tracking of unconstrained human movement. Using image sequences acquired simultaneously from multiple views, we recover the 3-D body pose at each time instant without the use of markers. The pose-recovery problem is formulated as a search problem and entails finding the pose parameters of a graphical human model whose synthesized appearance is most similar to the actual appearance of the real human in the multi-view images. The models used for this purpose are acquired from the images. We use a decomposition approach and a best-first technique to search through the high dimensional pose parameter space. A robust variant of chamfer matching is used as a fast similarity measure between synthesized and real edge images. We present initial tracking results from a large new Humans-In-Action (HIA) database containing more than 2500 frames in each of four orthogonal views. They contain subjects involved in a variety of activities, of various degrees of complexity, ranging from the more simple one-person hand waving to the challenging two person close interaction in the Argentine Tango.
暂无评论