Automatic target recognition (ATR) applications require simultaneously a wide field of view (FOV) for better detection and situation awareness, high resolution for target recognition and threat assessment, and high fr...
详细信息
ISBN:
(纸本)0818672587
Automatic target recognition (ATR) applications require simultaneously a wide field of view (FOV) for better detection and situation awareness, high resolution for target recognition and threat assessment, and high frame rate for detecting brief events and disambiguating frame-to-frame correlation. Uniformly sampling the entire FOV at recognition resolution is simply wasteful in ATR scenarios with localized regions of interest (ROIs). Foveal data acquisition with space-variant sampling and context-sensitive sensor articulation is highly optimized for active ATR applications. We propose a multiscale local Zernike filter-based front end target detection technique for a commercially feasible foveal sensor topology with piecewise constant resolution profile. Anisotropic heat diffusion is employed for preprocessing of the foveal data. Expansion template matching is used to derive a detection filter that optimizes the discriminant signal-to-noise ratio (SNR). Results are presented with simulated foveal imagery derived from real uniform acuity FLIR data.
Scene text recognition has inspired great interests from the computervision community in recent years. In this paper, we propose a novel scene text recognition method using part-based tree-structured character detect...
详细信息
ISBN:
(纸本)9780769549897
Scene text recognition has inspired great interests from the computervision community in recent years. In this paper, we propose a novel scene text recognition method using part-based tree-structured character detection. Different from conventional multi-scale sliding window character detection strategy, which does not make use of the character-specific structure information, we use part-based tree-structure to model each type of character so as to detect and recognize the characters at the same time. While for word recognition, we build a Conditional Random Field model on the potential character locations to incorporate the detection scores, spatial constraints and linguistic knowledge into one framework. The final word recognition result is obtained by minimizing the cost function defined on the random field. Experimental results on a range of challenging public datasets (ICDAR 2003, ICDAR 2011, SVT) demonstrate that the proposed method outperforms state-of-the-art methods significantly both for character detection and word recognition.
This paper proposes a method for detecting obstacles on a runway by controlling their expected disparities. By approximating the runway by a planar surface, the initial model flow field (MFF) corresponding to an obsta...
详细信息
ISBN:
(纸本)0818672587
This paper proposes a method for detecting obstacles on a runway by controlling their expected disparities. By approximating the runway by a planar surface, the initial model flow field (MFF) corresponding to an obstacle-free runway is described by the data from onboard sensors (OBS). The error variance of the initial MFF is computed and used to estimate the MFF. Obstacles are detected by comparing the expected residual flow disparities with the residual flow field (RFF) estimated after warping (or stabilizing) an image using the MFF. Expected temporal and spatial disparities are obtained from the use of the OBS. This allows us to control the residual disparities by increasing the temporal baseline and/or by utilizing the spatial baseline if distant objects cannot be detected for a given temporal baseline. Experimental results for two real flight image sequences are presented.
In stereo algorithms with more than two cameras, the improvement of accuracy is often reported since they are robust against noise. However, another important aspect of the polynocular stereo, that is the ability of o...
详细信息
ISBN:
(纸本)0818672587
In stereo algorithms with more than two cameras, the improvement of accuracy is often reported since they are robust against noise. However, another important aspect of the polynocular stereo, that is the ability of occlusion detection, has been paid less attention. We intensively analyzed the occlusion in the camera matrix stereo (SEA) and developed a simple but effective method to detect the presence of occlusion and to eliminate its effect in the correspondence search. By considering several statistics on the occlusion and the accuracy in the SEA, we derived a few base masks which represent occlusion patterns and are effective for the detection of occlusion. Several experiments using typical indoor scenes showed quite good performance to obtain dense and accurate depth maps even at the occluding boundaries of objects.
We have designed and implemented a real-time binocular tracking system which uses two independent cues commonly found in the primary functions of biological visual systems to robustly track moving targets in complex e...
详细信息
ISBN:
(纸本)0780342364
We have designed and implemented a real-time binocular tracking system which uses two independent cues commonly found in the primary functions of biological visual systems to robustly track moving targets in complex environments, without a-priori knowledge of the target shape or texture: a fast optical flow segmentation algorithm quickly locates independently moving objects for target acquisition and provides a reliable velocity estimate for smooth tracking. In parallel, target position is generated from the output of a zero-disparity filter where a phase-based disparity estimation technique allows dynamic control of the camera vergence to adapt the horopter geometry to the target location. The system takes advantage of the optical properties of our custom-designed foveated wide-angle lenses, which exhibit a wide field of view along with a high resolution fovea. Methods to cope with the distortions introduced by the space-variant resolution, and a robust real-time implementation on a high performance active vision head are presented.
In many vision problems, we want to infer two (or more) hidden factors which interact to produce our observations. We may want to disentangle illuminant and object colors in color constancy;rendering conditions from s...
详细信息
ISBN:
(纸本)0780342364
In many vision problems, we want to infer two (or more) hidden factors which interact to produce our observations. We may want to disentangle illuminant and object colors in color constancy;rendering conditions from surface shape in shape-from-shading;face identity and head pose in face recognition;or font and letter class in character recognition. We refer to these two factors generically as ''style'' and ''content''. Bilinear models offer a powerful framework for extracting the two-factor structure of a set of observations, and are familiar in computational vision from several well-known lines of research. This paper shows how bilinear models can be used to learn the style-content structure of a pattern analysis or synthesis problem, which can then be generalized to solve related tasks using different styles and/or content. We focus on three tasks: extrapolating the style of data to unseen content classes, classifying data with known content under a novel style, and translating data from novel content classes and style to a known style or content. We show examples from color constancy, face pose estimation, shape-from-shading, typography and speech.
Almost all work on texture in the computervision and graphics communities has modeled the texture as tangential, i.e. lying in the tangent plane to the surface. This is equivalent to thinking of the texture as a patt...
详细信息
ISBN:
(纸本)0780342364
Almost all work on texture in the computervision and graphics communities has modeled the texture as tangential, i.e. lying in the tangent plane to the surface. This is equivalent to thinking of the texture as a pattern painted on the surface. Three-dimensional textures, where the elements may point out of the surface, have largely been ignored. We study a special class of 3D textures, perpendicular textures where we can model the elements as being normal to the surface. The perspective projection of perpendicularly textured surfaces results in several interesting phenomena, which do not occur in the much-studied tangential texture cease. These include occlusion, foreshortening and illumination. In this paper, we study the geometry of the problem, modeling the locations of the elements of the texture as being a realization of a spatial point process. Relations between slant and tilt of the surface, density and height of elements and occlusions are derived. Occlusions can now be used as a cue to infer shape, instead of being treated as a source of error.
An automatic target recognition (ATR) classifier is proposed that uses modularly cascaded vector quantizers (VQs) and multilayer perceptrons (MLPs). A dedicated VQ codebook is constructed for each target class at a sp...
详细信息
ISBN:
(纸本)0818672587
An automatic target recognition (ATR) classifier is proposed that uses modularly cascaded vector quantizers (VQs) and multilayer perceptrons (MLPs). A dedicated VQ codebook is constructed for each target class at a specific range of aspects, which is trained with the K-means algorithm and a modified learning vector quantization (LVQ) algorithm. Each final codebook is expected to give the lowest mean squared error (MSE) for its correct target class at a given range of aspects. These MSEs are then processed by an array of window MLPs and a target MLP consecutively. In the spatial domain target recognition rates of 90.3 and 65.3 percent are achieved for moderately and highly cluttered test sets, respectively. Using the wavelet decomposition with an adaptive and independent codebook per subband, the VQs alone have produced recognition rates of 98.7 and 69.0 percent on more challenging training and test sets, respectively.
A common factor in all illusory contour figures is the perception of a surface occluding part of a background. In our previous work, we have shown we could diffuse a proper set of junction hypothesis (what is salient ...
详细信息
ISBN:
(纸本)0818672587
A common factor in all illusory contour figures is the perception of a surface occluding part of a background. In our previous work, we have shown we could diffuse a proper set of junction hypothesis (what is salient or background) to obtain a surface where their boundaries represented illusory contours. Amodal completions emerge at the overlapping surfaces. We address the problem of selecting the best image organization (set of hypothesis). We propose an optimization criteria based on a coherence measure between pairs of junctions (correlation between the diffusion of each pair). A statistical physics approach to select the best organization is applied. The experiments suggest that despite the large number of possible organizations our approach may take only a few steps (in organization space) to select the best one.
This paper summarizes a novel logic-based approach to grouping and perceptual organization, (presented more thoroughly in [2]), and presents novel efficient methods for computing interpretations in this framework. Gro...
详细信息
ISBN:
(纸本)0780342364
This paper summarizes a novel logic-based approach to grouping and perceptual organization, (presented more thoroughly in [2]), and presents novel efficient methods for computing interpretations in this framework. Grouping interpretations are first defined as logical structures, built out of atomic premises (''regularities'') that are derived from considerations of non-accidentalness. These interpretations can then be partially ordered by their degree of regularity or constraint (measured numerically by their codimension). The Genericity Constraint-the principle that interpretations should minimize coincidences in the observed configuration-dictates that the preferred interpretation will be the minimum in this partial order, i.e. the interpretation with maximum codimension. The preferred interpretation, called the qualitative parse, corresponds neatly to the interpretation intuitively preferred ed by human observers. As a side-effect, the ''most salient'' or most structured part of the scene can be identified, as the highest-codimension subtree of the qualitative parse. An efficient (O(n(2))) method for computing the maximum codimension interpretation is presented, along with examples.
暂无评论