Recursive compositional models (RCMs) are hierarchical models which enable us to represent the shape/geometry and visual appearance of objects and images at different scales. The key design principle is recursive comp...
详细信息
Recursive compositional models (RCMs) are hierarchical models which enable us to represent the shape/geometry and visual appearance of objects and images at different scales. The key design principle is recursive compositionality. Objects are represented by RCMs in a hierarchical form where complex structures are composed of more elementary structures. Formally, they are represented by probability distributions defined over graphs with variable topology. Learning techniques are used to learn these models from a limited number of examples of the object by exploiting the recursive structure (some of our papers use supervised learning while others are unsupervised and induce the object structure). In addition, we can exploit this structure to develop algorithms that can perform inference on these RCMs to rapidly detect and recognize objects. This differs from more standard “flat models” of objects which have much less representational power if they wish to maintain efficient learning and inference. The basic properties of an RCM are illustrated in figures (1, 2). Because RCMs give a rich hierarchical description of objects and images they can be applied to a range of tasks including object detection, segmentation, parsing and image parsing. In all cases, we achieved state of the art results when evaluated on datasets with groundtruth.
Approaches to single image categorization do not easily generalize to natural time-varying image sequences. In natural environments, object categories tend to have few features that help to distinguish between each ot...
详细信息
Approaches to single image categorization do not easily generalize to natural time-varying image sequences. In natural environments, object categories tend to have few features that help to distinguish between each other and the surrounding environment. To better discriminate between categories and the surrounding environment, we propose a multi-view categorization approach that exploits the statistics of image sequences rather than single images. The approach is unbiased towards redundant views - that is, it does not matter how many times an object appears from the same viewpoint. At the same time, the approach does not penalize for missing views, so that we do not have to capture an object at all viewpoints to successfully categorize the object. We first present a data set for studying natural environment monitoring: an image sequence of birds at a feeder station. After manual localization, a baseline bag of features approach was found to perform significantly worse on the proposed data set compared to the standard Caltech 101 data set. We find that our approach increases the categorization accuracy from 48% to 58% on average when compared to an equivalent single view categorization method. Finally, we show how the same metric proposed for the supervised categorization can be used to transform, in an unsupervised manner, an image sequence into a manageable set of categories.
The field of computational anatomy has developed rigorous frameworks for analyzing anatomical shape, based on diffeomorphic transformations of a template. However, differences in algorithms used for template warping, ...
详细信息
ISBN:
(纸本)9781424439942
The field of computational anatomy has developed rigorous frameworks for analyzing anatomical shape, based on diffeomorphic transformations of a template. However, differences in algorithms used for template warping, in regularization parameters, and in the template itself, lead to different representations of the same anatomy. Variations of these parameters are considered as confounding factors as they give rise to non-unique representation. Recently, extensions of the conventional computational anatomy framework to account for such confounding variations have shown that learning the equivalence class derived from the multitude of representations can lead to improved and more stable morphological descriptors. Herein, we follow that approach, estimating the morphological appearance manifold obtained by varying parameters of the template warping procedure. Our approach parallels work in the computervision field, in which variations lighting, pose and other parameters leads to image appearance manifolds representing the exact same figure in different ways. The proposed framework is then used for groupwise registration and statistical analysis of biomedical images, by employing a minimum variance criterion on selected complete morphological descriptor to perform manifold-constrained optimization, i.e. to traverse each individual's morphological appearance manifold until group variance is minimal. Effectively, this process removes the aforementioned confounding effects and potentially leads to morphological representations reflecting purely biological variations, instead of variations introduced by modeling assumptions and parameter settings. The nonlinearity of a morphological appearance manifold is treated via local approximations of the manifold via PCA.
We present a method for transparent watermarking using a custom bidirectional imaging device. The two innovative concepts of our approach are reflectance coding and multiview imaging. In reflectance coding, informatio...
详细信息
We present a method for transparent watermarking using a custom bidirectional imaging device. The two innovative concepts of our approach are reflectance coding and multiview imaging. In reflectance coding, information is embedded in the angular space of the bidirectional reflectance distribution function (BRDF) and this information can vary at each surface point. In order to achieve a transparent watermark, reflectance coding is implemented using a spatial variation of the Brewster angle. The novel multiview imaging method measures the reflectance over a range of viewing and illumination angles in order to instantly reveal the unknown Brewster angle. Unlike typical in-lab measurements of the Brewster angle or the refractive index, this method does not require accurate prior knowledge of the surface normal so that imaging in non-lab conditions is feasible. Furthermore, a range of incident angles are examined simultaneously, eliminating the need for scanning incidence angles. The approach is well-suited for transparent watermarking where the observer cannot see the watermark because it is comprised of spatial variations of refractive index. The transparency and angular coding of the watermark has great utility in deterring counterfeit attempts. In this paper, we present the imaging device and demonstrate it's effectiveness in detecting and measuring changes in refractive index. This device acts as the decoder in a transparent watermark system.
For the first time, we formulate an auxiliary particle filter jointly in the pixel domain and modulation domain for tracking infrared targets. This dual domain approach provides an information rich image representatio...
详细信息
ISBN:
(纸本)9781424439942
For the first time, we formulate an auxiliary particle filter jointly in the pixel domain and modulation domain for tracking infrared targets. This dual domain approach provides an information rich image representation comprising the pixel domain frames acquired directly from an imaging infrared sensor as well as 18 amplitude modulation functions obtained through a multicomponent AM-FM image analysis. The new dual domain auxiliary particle filter successfully tracks all of the difficult targets in the well-known AMCOM closure sequences in terms of both centroid location and target magnification. In addition, we incorporate the template update procedure into the particle filter formulation to extend previously studied dual domain track consistency checking mechanism far beyond the normalized cross correlation (NCC) trackers of the past by explicitly quantifying the differences in target signature evolution between the modulation and pixel domains. Experimental results indicate that the dual domain auxiliary particle filter with integrated target signature update provides a significant performance advantage relative to several recent competing algorithms.
We empirically evaluate a distance-guided learning method embedded in a multiple classifier system (MCS) for tissue segmentation in optical images of the uterine cervix. Instead of combining multiple base classifiers ...
详细信息
ISBN:
(纸本)9781424439942
We empirically evaluate a distance-guided learning method embedded in a multiple classifier system (MCS) for tissue segmentation in optical images of the uterine cervix. Instead of combining multiple base classifiers as in traditional ensemble methods, we propose a Bhattacharyya distance based metric for measuring the similarity in decision boundary shapes between a pair of statistical classifiers. By generating an ensemble of base classifiers trained independently on separate training images, we can use the distance metric to select those classifiers in the ensemble whose decision boundaries are similar to that of an unknown test image. In an extreme case, we select the base classifier with the most similar decision boundary to accomplish classification and segmentation on the test image. Our approach is novel in the way that the nearest neighbor is picked and effectively solves classification problems in which base classifiers with good overall performance are not easy to construct due to a large variation in the training examples. In our experiments, we applied our method and several popular ensemble methods to segmenting acetowhite regions in cervical images. The overall classification accuracy of the proposed method is significantly better than that of a single classifier learned using the entire training set, and is also superior to other ensemble methods including majority voting, STAPLE, Boosting and Bagging.
In this paper, we introduce a novel technique called Geometric Sequence (GS) imaging, specifically for the purpose of low power and light weight tracking in human computer interface design. The imaging sensor is progr...
详细信息
ISBN:
(纸本)9781424439942
In this paper, we introduce a novel technique called Geometric Sequence (GS) imaging, specifically for the purpose of low power and light weight tracking in human computer interface design. The imaging sensor is programmed to capture the scene with a train of packets, where each packet constitutes a few images. The delay or the baseline associated with consecutive image pairs in a packet follows a fixed ratio, as in a geometric sequence. The image pair with shorter baseline or delay captures fast motion, while the image pair with larger baseline or delay captures slow motion. Given an image packet, the motion confidence maps computed from the slow and the fast image pairs are fused into a single map. Next, we use a Bayesian update scheme to compute the motion hypotheses probability map, given the information of prior packets. We estimate the motion from this probability map. The GS imaging system reliably tracks slow movements as well as fast movements, a feature that is important in realizing applications such as a touchpad type system. Compared to continuous imaging with short delay between consecutive pairs, the GS imaging technique enjoys several advantages. The overall power consumption and the CPU load are significantly low. We present results in the domain of optical camera based human computer interface (HCI) applications, as well as for capacitive fingerprint imaging sensor based touch pad systems.
Summary form only given: We present the LHI dataset, a large-scale ground truth image dataset, and a top-down/bottom-up scheme for scheduling the inference processes in stochastic image grammar (SIG). Development of s...
详细信息
Summary form only given: We present the LHI dataset, a large-scale ground truth image dataset, and a top-down/bottom-up scheme for scheduling the inference processes in stochastic image grammar (SIG). Development of stochastic image grammar needs ground truth image data for diverse training and evaluation purposes, which can only be collected through manual annotation of thousands of images for a variety of object categories. This is too time-consuming a task for each research lab to do independently and a centralized general purpose ground truth dataset is much needed. In response to this need, the Lotus Hill Institute (LHI), an independent non-profit research institute in China, is founded in the summer of 2005. It has a full time annotation team for parsing the image structures and a development team for the annotation tools and database construction. Each image or object is parsed, semi-automatically, into a parse graph where the relations are specified and objects are named using the WordNet standard. The Lotus Hill Institute has now over 500,000 images (or video frames) parsed, covering 280 object categories. In computing, we present a method for scheduling bottom-up and top-down processes in image parsing with and-or graph (AoG) for advancing performance and speeding up on-line computation. For each node in an AoG, two types of bottom-up computing processes and one kind of top-down computing process are identified.
In this paper, we address the problem of recovering a hyperspectral texture descriptor. We do this by viewing the wavelength-indexed bands corresponding to the texture in the image as those arising from a stochastic p...
详细信息
ISBN:
(纸本)9781424439942
In this paper, we address the problem of recovering a hyperspectral texture descriptor. We do this by viewing the wavelength-indexed bands corresponding to the texture in the image as those arising from a stochastic process whose statistics can be captured making use of the relationships between moment generating functions and Fourier kernels. In this manner, we can interpret the probability distribution of the hyperspectral texture as a heavy-tailed one which can be rendered invariant to affine geometric transformations on the texture plane making use of the spectral power of its Fourier cosine transform. We do this by recovering the affine geometric distortion matrices corresponding to the probability density function for the texture under study. This treatment permits the development of a robust descriptor which has a high information compaction property and can capture the space and wavelength correlation for the spectra in the hyperspectral images. We illustrate the utility of our descriptor for purposes of recognition and provide results on real-world datasets. We also compare our results to those yielded by a number of alternatives.
A robust classification method is developed on the basis of sparse subspace decomposition. This method tries to decompose a mixture of subspaces of unlabeled data (queries) into class subspaces as few as possible. Eac...
详细信息
A robust classification method is developed on the basis of sparse subspace decomposition. This method tries to decompose a mixture of subspaces of unlabeled data (queries) into class subspaces as few as possible. Each query is classified into the class whose subspace significantly contributes to the decomposed subspace. Multiple queries from different classes can be simultaneously classified into their respective classes. A practical greedy algorithm of the sparse subspace decomposition is designed for the classification. The present method achieves high recognition rate and robust performance exploiting joint sparsity.
暂无评论