In this paper, we adapt the Multiple Instance Learning paradigm using the Diverse Density algorithm as a way of modeling the ambiguity in images in order to learn `visual concepts' that can be used to classify new...
详细信息
ISBN:
(纸本)769501494
In this paper, we adapt the Multiple Instance Learning paradigm using the Diverse Density algorithm as a way of modeling the ambiguity in images in order to learn `visual concepts' that can be used to classify new images. In this framework, a user labels an image as positive if the image contains the concept. Each example image is a bag of instances (sub-images) where only the bag is labeled - not the individual instances (sub-images). From a small collection of positive and negative examples, the system learns the concept and uses it to retrieve images that contain the concept from a large database. The learned `concepts' are simple templates that capture the color, texture and spatial properties of the class of images. We introduced this method earlier in the domain of natural scene classification using simple, low resolution sub-images as instances. In this paper, we extend the bag generator (the mechanism which takes an image and generates a set of instances) to generate more complex instances using multiple cues on segmented high resolution images. We show that this method can be used to learn certain object class concepts (e.g. cars) in addition to natural scenes.
Background estimation and removal based on the joint use of range and color data produces superior results than can be achieved with either data source alone. This is increasingly relevant as inexpensive, real-time, p...
详细信息
ISBN:
(纸本)769501494
Background estimation and removal based on the joint use of range and color data produces superior results than can be achieved with either data source alone. This is increasingly relevant as inexpensive, real-time, passive range systems become more accessible through novel hardware and increased CPU processing speeds. Range is a powerful signal for segmentation which is largely independent of color, and hence not effected by the classic color segmentation problems of shadows and objects with color similar to the background. However, range alone is also not sufficient for the good segmentation: depth measurements are rarely available at all pixels in the scene, and foreground objects may be indistinguishable in depth when they are close to the background. Color segmentation is complementary in these cases. Surprisingly, little work has been done to date on joint range and color segmentation. We describe and demonstrate a background estimation method based on a multidimensional (range and color) clustering at each image pixel. Segmentation of the foreground in a given frame is performed via comparison with background statistics in range and normalized color. Important implementation issues such as treatment of shadows and low confidence measurements are discussed in detail.
A new fully automated shape learning method is presented. It is based on clustering a set of training shapes in the original shape space (defined by the coordinates of the contour points) and performing a Procrustes a...
详细信息
ISBN:
(纸本)769501494
A new fully automated shape learning method is presented. It is based on clustering a set of training shapes in the original shape space (defined by the coordinates of the contour points) and performing a Procrustes analysis on each cluster to obtain cluster prototypes and information about shape variation. The main difference from previously reported methods is that the training set is first automatically clustered and those shapes considered to be outliers are discarded. The second difference is in the manner in which registered sets of points are extracted from each shape contour. As a direct application of our shape learning method, an 11-structure shape model of brain substructures was extracted from MR image data, an eigen-shape model was automatically trained, and employed to segment several MR brain images not present in the shape-training set. A quantitative analysis of our shape registration approach, within the main cluster of each structure, shows that our results compare very well to those achieved by manual registration;achieving an average ms error of about 1 pixel. Our approach can serve as a fully automated substitute to the tedious and time-consuming manual shape registration and analysis.
The minimal data necessary for projective reconstruction from point correspondences is well-known when the points are visible in all images. In this paper, we formulate and propose solutions to a new family of reconst...
详细信息
ISBN:
(纸本)769501494
The minimal data necessary for projective reconstruction from point correspondences is well-known when the points are visible in all images. In this paper, we formulate and propose solutions to a new family of reconstruction problems from multiple images with minimal data, where there are missing points in some of the images. The ability to handle the minimal cases with missing data is of great theoretical and practical importance. It is unavoidable to use them to bootstrap robust estimation such as RANSAC and LMS algorithms and optimal estimation such as bundle adjustment. First, we develop a framework to parametrize the multiple view geometry, needed to handle the missing data cases. Then we present a solution to the minimal case of 8 points in 3 images, where one of the points is missing in one of the three images. We prove that there are in general as many as 11 solutions for this minimal case. Furthermore, all minimal cases with missing data for 3 and 4 images are catalogued. Finally, we demonstrate the method on both simulated and real images and show that the algorithms presented in this paper can be used for practical problems.
In this work, we use points, lines, and the linear extremal contours of cylinders to estimate the position and orientation of the camera in the world coordinate, system. Other line-based pose estimation methods use th...
详细信息
ISBN:
(纸本)769501494
In this work, we use points, lines, and the linear extremal contours of cylinders to estimate the position and orientation of the camera in the world coordinate, system. Other line-based pose estimation methods use the correspondences between 3D lines in space and 2D image lines, although the model and its observation are finite line segments. We present a noise model describing the probabilistic relationship between 3D lines and cylinders and their noisy observations. The noise model takes the finite nature of the observation into account. Position and orientation of cameras are estimated using a maximum-likelihood approach. Covariance matrices, confidence limits, and standard translation and rotation errors are estimated by singular value decomposition of the Jacobian matrix. The method provides clear indications on the reliability of each of the estimated parameters, and enables the user to add appropriate information, in terms of feature correspondence, to improve the accuracy if necessary. Simulation results are used to compare this method with some of the previously published ones. The algorithm is currently being used on real data for the update of 3D CAD models of industrial environments.
Superquadrics are a family of parametric shapes which can model a diverse set of objects. They have received significant attention because of their compact representation and robust methods for recovery of 3D models. ...
详细信息
ISBN:
(纸本)769501494
Superquadrics are a family of parametric shapes which can model a diverse set of objects. They have received significant attention because of their compact representation and robust methods for recovery of 3D models. However, their assumption of intrinsical symmetry fails in modeling numerous real-world examples such as human body, animals, and other naturally occurring objects. In this paper, we present a novel approach, which is called extended superquadric, to extend superquadric's representation power with exponent functions. An extended superquadric model can be deformed in any direction because it extends the exponents of superquadrics from constants to functions of the latitude and longitude angles in the spherical coordinate system. Thus, extended superquadrics can model more complex shapes than superquadrics. It also maintains many desired properties of superquadrics such as compactness, controllability, and intuitive meaning, which are all advantageous for shape modeling, recognition, and reconstruction. In this paper, besides the use of extended superquadrics for modeling, we also discuss our research into the recovery of extended superquadrics from 3D information (reconstruction). Experimental results of fitting extended superquadrics to 3D real data are presented. Our results are very encouraging and indicate that the use of extended superquadric is a promising paradigm for shape representation and recovery in computervision and has potential benefits for the generation of synthetic images for computer graphics.
A mechanism is proposed that integrates low-level (image processing), mid-level (recursive 3D trajectory estimation), and high-level (action recognition) processes. It is assumed that the system observes multiple movi...
详细信息
ISBN:
(纸本)769501494
A mechanism is proposed that integrates low-level (image processing), mid-level (recursive 3D trajectory estimation), and high-level (action recognition) processes. It is assumed that the system observes multiple moving objects via a single, uncalibrated video camera. A novel extended Kalman filter formulation is used in estimating the relative 3D motion trajectories up to a scale factor. The recursive estimation process provides a prediction and error measure that is exploited in higher-level stages of action recognition. Conversely, higher-level mechanisms provide feedback that allows the system to reliably segment and maintain the tracking of moving objects before, during, and after occlusion. The 3D trajectory, occlusion, and segmentation information are utilized in extracting stabilized views of the moving object. Trajectory-guided recognition (TGR) is proposed as a new and efficient method for adaptive classification of action. The TGR approach is demonstrated using `motion history images' that are then recognized via a mixture of Gaussian classifier. The system was tested in recognizing various dynamic human outdoor activities;e.g., running, walking, roller blading, and cycling. Experiments with synthetic data sets are used to evaluate stability of the trajectory estimator with respect to noise.
We are developing a system to extract geodetic, textured CAD models from thousands of initially uncontrolled, close-range ground and aerial images of urban scenes. Here we describe one component of the system, which o...
详细信息
ISBN:
(纸本)769501494
We are developing a system to extract geodetic, textured CAD models from thousands of initially uncontrolled, close-range ground and aerial images of urban scenes. Here we describe one component of the system, which operates after the imagery has been controlled or geo-referenced. This fully automatic component detects significant vertical facades in the scene, then extrudes them to meet an inferred, triangulated terrain and procedurally generated roof polygons. The algorithm then estimates for each surface a computer graphics texture, or diffuse reflectance map, from the many available observations of that surface. We present the results of the algorithm on a complex dataset: nearly 4,000 high-resolution digital images of a small (200 meter square) office park, acquired from close range under highly varying lighting conditions, amidst significant occlusion due both to multiple inter-occluding structures and dense foliage. While the results are of less fidelity than that would be achievable by an interactive system, our algorithm is the first to be demonstrated on such a large, real-world dataset.
This paper presents a novel approach for generating and analyzing epipolar plane images (EPIs) from video sequences taken from a moving platform subject to vibration so that the 3D model of an arbitrary scene can be c...
详细信息
ISBN:
(纸本)769501494
This paper presents a novel approach for generating and analyzing epipolar plane images (EPIs) from video sequences taken from a moving platform subject to vibration so that the 3D model of an arbitrary scene can be constructed. Two problems are solved in our approach: (1) how to generate EPIs from video under a more general motion than a pure translation;(2) how to analyze the huge amount of data in the EPIs robustly and efficiently. For the first problem, a 3D image stabilization method is proposed which decouples the vibration from the vehicle's motion so that good EPIs and panoramic view images (PVIs) can be generated. For the second problem, we propose an efficient panoramic EPI analysis (PEPIA) method in which only one scanline of each EPI is processed. The PEPIA combines advantages of PVIs and EPIs and consists of three important steps: locus orientation detection, motion boundary localization, and occlusion/resolution recovery. The output of the PEPIA - a layered 3D panorama, is very useful in visual navigation and virtual reality modeling. Since camera calibration, image segmentation, feature extraction and matching are avoided, all the proposed algorithms are fully automatic and rather general. Results on real image sequences are given.
A new approach for segmentation of nuclei observed with an epi-fluorescence microscope is presented. The technique is model based and uses local feature activities such as step-edge segments, roof-edge segments, and c...
详细信息
ISBN:
(纸本)769501494
A new approach for segmentation of nuclei observed with an epi-fluorescence microscope is presented. The technique is model based and uses local feature activities such as step-edge segments, roof-edge segments, and concave corners to construct a set of initial hypotheses. These local-feature activities are extracted using either local or global operators to form a possible set of hypotheses. Each hypothesis is expressed as a hyperquadric for better stability, compactness, and error handling. The search space is expressed as an assignment matrix with an appropriate cost function to ensure local, adjacency, and global consistency. Each possible configuration of a set of nuclei defines a path, and the path with the least error corresponds to best representation. This result is then presented to an operator who verifies and eliminates a small number of errors.
暂无评论