An automatic target recognition (ATR) classifier is proposed that uses modularly cascaded vector quantizers (VQs) and multilayer perceptrons (MLPs). A dedicated VQ codebook is constructed for each target class at a sp...
详细信息
ISBN:
(纸本)0818672587
An automatic target recognition (ATR) classifier is proposed that uses modularly cascaded vector quantizers (VQs) and multilayer perceptrons (MLPs). A dedicated VQ codebook is constructed for each target class at a specific range of aspects, which is trained with the K-means algorithm and a modified learning vector quantization (LVQ) algorithm. Each final codebook is expected to give the lowest mean squared error (MSE) for its correct target class at a given range of aspects. These MSEs are then processed by an array of window MLPs and a target MLP consecutively. In the spatial domain target recognition rates of 90.3 and 65.3 percent are achieved for moderately and highly cluttered test sets, respectively. Using the wavelet decomposition with an adaptive and independent codebook per subband, the VQs alone have produced recognition rates of 98.7 and 69.0 percent on more challenging training and test sets, respectively.
The paper presents an analysis of the stability of pose estimation. The investigated pose estimation technique is based on orientations of three edge segments and provides the rotation part of object pose. The specifi...
详细信息
ISBN:
(纸本)0818672587
The paper presents an analysis of the stability of pose estimation. The investigated pose estimation technique is based on orientations of three edge segments and provides the rotation part of object pose. The specific emphasis of the analysis is on determining how the stability varies with view point relative to an object. The stability investigation propagates the uncertainty in edge segment orientations to the resulting effect on the pose parameters. It is shown that there is a very strong variation in noise sensitivity over the range of viewpoints and that exactly what viewpoints offer highest robustness towards noise can be determined in advance. Experiments on real images verify the theoretical results and show that, dependent on viewpoint, pose parameter variance varies from 0.05 to 20 (degrees squared).
State-of-the-art vision and vision-and-language models rely on large-scale visio-linguistic pretraining for obtaining good performance on a variety of downstream tasks. Generally, such models are often either cross-mo...
详细信息
ISBN:
(数字)9781665469463
ISBN:
(纸本)9781665469463
State-of-the-art vision and vision-and-language models rely on large-scale visio-linguistic pretraining for obtaining good performance on a variety of downstream tasks. Generally, such models are often either cross-modal (contrastive) or multi-modal (with earlier fusion) but not both;and they often only target specific modalities or tasks. A promising direction would be to use a single holistic universal model, as a "foundation", that targets all modalities at once-a true vision and language foundation model should be good at vision tasks, language tasks, and cross- and multi-modal vision and language tasks. We introduce FIAVA as such a model and demonstrate impressive performance on a wide range of 35 tasks spanning these target modalities.
This paper summarizes a novel logic-based approach to grouping and perceptual organization, (presented more thoroughly in [2]), and presents novel efficient methods for computing interpretations in this framework. Gro...
详细信息
ISBN:
(纸本)0780342364
This paper summarizes a novel logic-based approach to grouping and perceptual organization, (presented more thoroughly in [2]), and presents novel efficient methods for computing interpretations in this framework. Grouping interpretations are first defined as logical structures, built out of atomic premises (''regularities'') that are derived from considerations of non-accidentalness. These interpretations can then be partially ordered by their degree of regularity or constraint (measured numerically by their codimension). The Genericity Constraint-the principle that interpretations should minimize coincidences in the observed configuration-dictates that the preferred interpretation will be the minimum in this partial order, i.e. the interpretation with maximum codimension. The preferred interpretation, called the qualitative parse, corresponds neatly to the interpretation intuitively preferred ed by human observers. As a side-effect, the ''most salient'' or most structured part of the scene can be identified, as the highest-codimension subtree of the qualitative parse. An efficient (O(n(2))) method for computing the maximum codimension interpretation is presented, along with examples.
We propose a novel unsupervised method for discovering recurring patterns from a single view. A key contribution of our approach is the formulation and validation of a joint assignment optimization problem where multi...
详细信息
ISBN:
(纸本)9780769549897
We propose a novel unsupervised method for discovering recurring patterns from a single view. A key contribution of our approach is the formulation and validation of a joint assignment optimization problem where multiple visual words and object instances of a potential recurring pattern are considered simultaneously. The optimization is achieved by a greedy randomized adaptive search procedure (GRASP) with moves specifically designed for fast convergence. We have quantified systematically the performance of our approach under stressed conditions of the input (missing features, geometric distortions). We demonstrate that our proposed algorithm outperforms state of the art methods for recurring pattern discovery on a diverse set of 400+ real world and synthesized test images.
We address the problem of performing backpropagation for computation graphs involving 3D transformation groups SO(3), SE(3), and Sim(3). 3D transformation groups are widely used in 3D vision and robotics, but they do ...
详细信息
ISBN:
(纸本)9781665445092
We address the problem of performing backpropagation for computation graphs involving 3D transformation groups SO(3), SE(3), and Sim(3). 3D transformation groups are widely used in 3D vision and robotics, but they do not form vector spaces and instead lie on smooth manifolds. The standard backpropagation approach, which embeds 3D transformations in Euclidean spaces, suffers from numerical difficulties. We introduce a new library, which exploits the group structure of 3D transformations and performs backpropagation in the tangent spaces of manifolds. We show that our approach is numerically more stable, easier to implement, and beneficial to a diverse set of tasks.
This paper proposes a method for detecting obstacles on a runway by controlling their expected disparities. By approximating the runway by a planar surface, the initial model flow field (MFF) corresponding to an obsta...
详细信息
ISBN:
(纸本)0818672587
This paper proposes a method for detecting obstacles on a runway by controlling their expected disparities. By approximating the runway by a planar surface, the initial model flow field (MFF) corresponding to an obstacle-free runway is described by the data from onboard sensors (OBS). The error variance of the initial MFF is computed and used to estimate the MFF. Obstacles are detected by comparing the expected residual flow disparities with the residual flow field (RFF) estimated after warping (or stabilizing) an image using the MFF. Expected temporal and spatial disparities are obtained from the use of the OBS. This allows us to control the residual disparities by increasing the temporal baseline and/or by utilizing the spatial baseline if distant objects cannot be detected for a given temporal baseline. Experimental results for two real flight image sequences are presented.
The ability to normalize pose based on super-category landmarks can significantly improve models of individual categories when training data are limited. Previous methods have considered the use of volumetric or morph...
详细信息
ISBN:
(纸本)9781467312288
The ability to normalize pose based on super-category landmarks can significantly improve models of individual categories when training data are limited. Previous methods have considered the use of volumetric or morphable models for faces and for certain classes of articulated objects. We consider methods which impose fewer representational assumptions on categories of interest, and exploit contemporary detection schemes which consider the ensemble of responses of detectors trained for specific pose-keypoint configurations. We develop representations for poselet-based pose normalization using both explicit warping and implicit pooling as mechanisms. Our method defines a pose normalized similarity or kernel function that is suitable for nearest-neighbor or kernel-based learning methods.
We present a quadratic unconstrained binary optimization (QUBO) framework for reasoning about multiple object detections with spatial overlaps. The method maximizes an objective function composed of unary detection co...
详细信息
ISBN:
(纸本)9780769549897
We present a quadratic unconstrained binary optimization (QUBO) framework for reasoning about multiple object detections with spatial overlaps. The method maximizes an objective function composed of unary detection confidence scores and pairwise overlap constraints to determine which overlapping detections should be suppressed, and which should be kept. The framework is flexible enough to handle the problem of detecting objects as a shape covering of a foreground mask, and to handle the problem of filtering confidence weighted detections produced by a traditional sliding window object detector. In our experiments, we show that our method outperforms two existing state-of-the-art pedestrian detectors.
Recent work has shown that self-attention can serve as a basic building block for image recognition models. We explore variations of self-attention and assess their effectiveness for image recognition. We consider two...
详细信息
ISBN:
(数字)9781728171685
ISBN:
(纸本)9781728171685
Recent work has shown that self-attention can serve as a basic building block for image recognition models. We explore variations of self-attention and assess their effectiveness for image recognition. We consider two forms of self-attention. One is pairwise self-attention, which generalizes standard dot-product attention and is fundamentally a set operator. The other is patchwise self-attention, which is strictly more powerful than convolution. Our pairwise self-attention networks match or outperform their convolutional counterparts, and the patchwise models substantially outperform the convolutional baselines. We also conduct experiments that probe the robustness of learned representations and conclude that self-attention networks may have significant benefits in terms of robustness and generalization.
暂无评论