The study of 2D shapes and their similarities is a central problem in the field of vision. It arises in particular from the task of classifying and recognizing objects from their observed silhouette. Defining natural ...
详细信息
The study of 2D shapes and their similarities is a central problem in the field of vision. It arises in particular from the task of classifying and recognizing objects from their observed silhouette. Defining natural distances between 2D shapes creates a metric space of shapes, whose mathematical structure is inherently relevant to the classification task. One intriguing metric space comes from using conformal mappings of 2D shapes into each other, via the theory of Teichmuller spaces. In this space, every simple closed curve in the plane (a "shape") is represented by a "fingerprint", which is a diffeomorphism of the unit circle to itself (a differentiable and invertible, periodic function). More precisely, every shape defines to a unique equivalence class of such diffeomorphisms up to right multiplication by a Mobius map. The fingerprint does not change if the shape is varied by translations and scaling and any such equivalence class comes from some shape. This coset space, equipped with the infinitesimal Weil-Petersson (WP) Riemannian norm is a metric space. In this space, it appears very likely to be true that the shortest path between each two shapes is unique, and is given by a geodesic connecting them. Their distance from each other is given by integrating the WP-norm along that geodesic. In this paper we concentrate on solving the "welding" problem of "sewing" together conformally the interior and exterior of the unit circle, glued on the unit circle by a given diffeomorphism, to obtain the unique 2D shape associated with this diffeomorphism. These allow us to go back and forth between 2D shapes and their representing diffeomorphisms in this "space of shapes".
We develop and evaluate in this paper a multi-classifier framework for atlas-based segmentation, a popular segmentation method in biomedical image analysis. An atlas is a spatial map of classes (e.g., anatomical struc...
详细信息
We develop and evaluate in this paper a multi-classifier framework for atlas-based segmentation, a popular segmentation method in biomedical image analysis. An atlas is a spatial map of classes (e.g., anatomical structures), which is usually derived from a reference individual by manual segmentation. An atlas-based classification is generated by registering an image to an atlas, that is, by computing a semantically correct coordinate mapping between the two. In the present paper the registration algorithm is an intensity-based non-rigid method that computes a free-form deformation (FFD) defined on a uniform grid of control points. The transformation is regularized by a weighted smoothness constraint term. Different atlases, as well as different parameterizations of the registration algorithm, lead to different and somewhat independent atlas-based classifiers. The outputs of these classifiers can be combined in order to improve overall classification accuracy. In an evaluation study, biomedical images from seven subjects are segmented (1) using three individual atlases; (2) using one atlas and three different resolutions of the FFD control point grid, (3) using one atlas and three different regularization constraint weights. In each case, the three individual segmentations are combined by Sum Rule fusion. For each individual and for each combined segmentation, its recognition rate (relative number of correctly labeled image voxels) is computed against a manual gold-standard segmentation. In all cases, classifier combination consistently improved classification accuracy. The biggest improvement was achieved using multiple atlases, a smaller gain resulted from multiple regularization constraint weights, and a marginal gain resulted from multiple control point spacings. We conclude that multi-classifier methods have a natural application to atlas-based segmentation and the potential to increase classification accuracy in real-world segmentation problems.
Image retrieval critically relies on the distance function used to compare a query image to images in the database. We suggest learning such distance functions by training binary classifiers with margins, where the cl...
详细信息
Image retrieval critically relies on the distance function used to compare a query image to images in the database. We suggest learning such distance functions by training binary classifiers with margins, where the classifiers are defined over the product space of pairs of images. The classifiers are trained to distinguish between pairs in which the images are from the same class and pairs, which contain images from different classes. The signed margin is used as a distance function. We explore several variants of this idea, based on using SVM and boosting algorithms as product space classifiers. Our main contribution is a distance learning method, which combines boosting hypotheses over the product space with a weak learner based on partitioning the original feature space. The weak learner used is a Gaussian mixture model computed using a constrained EM algorithm, where the constraints are equivalence constraints on pairs of data points. This approach allows us to incorporate unlabeled data into the training process. Using some benchmark databases from the UCI repository, we show that our margin based methods significantly outperform existing metric learning methods, which are based an learning a Mahalanobis distance. We then show comparative results of image retrieval in a distributed learning paradigm, using two databases: a large database of facial images (YaleB), and a database of natural images taken from a commercial CD. In both cases our GMM based boosting method outperforms all other methods, and its generalization to unseen classes is superior.
This paper proposes a joint feature-based model indexing and geometric constraint based alignment pipeline for efficient and accurate recognition of 3D objects from a large model database. Traditional approaches eithe...
详细信息
This paper proposes a joint feature-based model indexing and geometric constraint based alignment pipeline for efficient and accurate recognition of 3D objects from a large model database. Traditional approaches either first prune the model database using indexing without geometric alignment or directly perform recognition based alignment. The indexing based pruning methods without geometric constraints can miss the correct models under imperfections such as noise, clutter and obscurations. Alignment based verification methods have to linearly verify each model in the database and hence do not scale up. The proposed techniques use spin images as semi-local shape descriptors and locality-sensitive hashing (LSH) to index into a joint spin image database for all the models. The indexed models represented in the pruned set are further pruned using progressively complex geometric constraints. A simple geometric configuration of multiple spin images, for instance a doublet, is first used to check for geometric consistency. Subsequently, full Euclidean geometric constraints are applied using RANSAC-based techniques on the pruned spin images and the models to verify specific object identity. As a result, the combined indexing and geometric alignment based pipeline is able to focus on matching the most promising models, and generate far less pose hypotheses while maintaining the same level of performance as the sequential alignment based recognition. Furthermore, compared to geometric indexing techniques like geometric hashing, the construction time and storage complexity for the proposed technique remains linear in the number of features rather than higher order polynomial. Experiments on a 56 3D model database show promising results.
The recent establishment of a large-scale ground-truth database of image segmentations [D. Martin et al., 2001] has enabled the development of learning approaches to the general segmentation problem. Using this databa...
详细信息
The recent establishment of a large-scale ground-truth database of image segmentations [D. Martin et al., 2001] has enabled the development of learning approaches to the general segmentation problem. Using this database, we present an algorithm that learns how to segment images using region-based, perceptual features. The image is first densely segmented into regions and the edges between them using a variant of the Mumford-Shah functional. Each edge is classified as a boundary or non-boundary using a classifier trained on the ground-truth, resulting in an edge image estimating human-designated boundaries. This novel approach has a few distinct advantages over filter-based methods such as local gradient operators. First, the same perceptual features can represent texture as well as regular structure. Second, the features can measure relationships between image elements at arbitrary distances in the image, enabling the detection of Gestalt properties at any scale. Third, texture boundaries can be precisely localized, which is difficult when using filter banks. Finally, the learning system outputs a relatively small set of intuitive perceptual rules for detecting boundaries. The classifier is trained on 200 images in the ground-truth database, and tested on another 100 images according to the benchmark evaluation methods. Edge classification improves the benchmark F-score from 0.54, for the initial Mumford-Shah-variant segmentation, to 0.61 on grayscale images. This increase of 13% demonstrates the versatility and representational power of our perceptual features, as the score exceeds published results for any algorithm restricted to one type of image feature such as texture or brightness gradient.
The following topics are discussed: computervision; illumination and appearance-based matching; tracking; shape recognition; image segmentation; medical image analysis; 3D reconstruction; face recognition; motion est...
The following topics are discussed: computervision; illumination and appearance-based matching; tracking; shape recognition; image segmentation; medical image analysis; 3D reconstruction; face recognition; motion estimation; illumination and image restoration; fingerprint recognition; image registration; image retrieval; and reflectance model.
The introduction of airbags into automobiles has significantly improved the safety of the occupants. Unfortunately, airbags can also cause fatal injuries if the occupant is a child smaller (in weight) than a typical 6...
详细信息
The introduction of airbags into automobiles has significantly improved the safety of the occupants. Unfortunately, airbags can also cause fatal injuries if the occupant is a child smaller (in weight) than a typical 6 year old. In response to this, The National Highway Transportation and Safety Administration (NHTSA) has mandated that starting in the 2006 model year all automobiles be equipped with an automatic suppression system to detect the presence of a child or infant and suppress the airbag. The classification problem we address is a four-class problem with the classes being rear-facing infant seat, child, adult, and empty seat. We describe a machine vision-based occupant classification system using a single greyscale camera and a digital signal processor that can perform this function in "real time" (
We question the role that large scale filter banks have traditionally played in texture classification. It is demonstrated that textures can be classified using the joint distribution of intensity values over extremel...
详细信息
We question the role that large scale filter banks have traditionally played in texture classification. It is demonstrated that textures can be classified using the joint distribution of intensity values over extremely compact neighbourhoods (starting from as small as 3 x 3 pixels square), and that this outperforms classification using filter banks with large support. We develop a novel texton based representation which is suited-to modelling this joint neighbourhood distribution for MRFs. The representation is learnt from training images, and then used to classify novel images (with unknown viewpoint and lighting) into texture classes. The power of the method is demonstrated by classifying over 2800 images of all 61 textures present in the Columbia-Utrecht database. The classification performance surpasses that of recent state-of-the-art filter bank based classifiers such as Leung & Malik [IJCV 01], Cula & Dana [cvpr 01], and Varma & Zisserman [ECCV 02].
The proceedings contains 112 papers. Topics discussed include illumination and appearance based matching, motion and layers, tracking, shape, recognition, segmentation and medical image analysis, three dimensional rec...
详细信息
The proceedings contains 112 papers. Topics discussed include illumination and appearance based matching, motion and layers, tracking, shape, recognition, segmentation and medical image analysis, three dimensional reconstruction, face detection and recognition, learning and statistical methods, calibration and structure from motion, face and gesture, learning and statistical methods and image restoring.
暂无评论