The complexity of real world image categorization and scene analysis requires compositional strategies for object representation. This contribution establishes a compositional hierarchy by first performing a perceptua...
详细信息
The complexity of real world image categorization and scene analysis requires compositional strategies for object representation. This contribution establishes a compositional hierarchy by first performing a perceptual bottom-up grouping of edge pixels to generate salient contour curves. A subsequent recursive top-down grouping yields a hierarchy of compositions. All entities in the compositional hierarchy are incorporated in a Bayesian network that couples them together by means of a shape model. The probabilistic model underlying top-down grouping as well as the shape model is learned automatically from a set of training images for the given categories. As a consequence, compositionality simplifies the learning of complex category models by building them from simple, frequently used compositions. The architecture is evaluated on the highly challenging Caltech 101 database1 which exhibits large intra-category variations. The proposed compositional approach shows competitive retrieval rates in the range of 53 .0 ± 0 .49%.
We present a novel feature-based non-rigid image registration algorithm using a small number of automatically extracted points and their associated local salient region features. Our automatic registration is a hybrid...
详细信息
We present a novel feature-based non-rigid image registration algorithm using a small number of automatically extracted points and their associated local salient region features. Our automatic registration is a hybrid approach co-optimizing point-based and image-based terms. Motivated by the paradigm of the TPS-RPM algorithm [6], we develop the RHDM (Robust Hybrid Deformable Matching) algorithm by alternatively optimizing correspondences and transformations for registration. The local salient region features and the geometric features, together with the softassign and deterministic annealing techniques, are used for solving correspondences. Thin-plate splines are used for generating a smooth non-rigid spatial transformation. Our algorithm is built to be extremely robust to feature extraction errors. A new dynamic outlier rejection mechanism is described for rejecting outliers and generating accurate spatial mappings. A local refinement technique is used for correcting non-exactly matched correspondences arising from image noise and irregular deformations. In contrast with the TPS-RPM algorithm, which can handle only outliers in one point set, our algorithm is able to handle a considerable number of outliers in both point sets. The experimental results demonstrate the robustness and accuracy of our algorithm.
We consider the problem of deriving a global interpretation of an image in terms of a small set of smooth curves. The problem is posed using a statistical model for images with multiple curves. Besides having importan...
详细信息
We consider the problem of deriving a global interpretation of an image in terms of a small set of smooth curves. The problem is posed using a statistical model for images with multiple curves. Besides having important applications to edge detection and grouping the curve finding task is a special case of a more general problem, where we want to explain the whole image in terms of a small set of objects. We describe a novel approach for estimating the content of scenes with multiple objects using a min-cover framework that is simple and powerful. The min-cover problem is NP-hard but there is a good approximation algorithm that sequentially selects objects minimizing a "cost per pixel" measure. In the case of curve detection we use a type of best-first search to quickly find good curves for the covering algorithm. The method integrates image data over long curves without relying on binary feature detection. We have applied the curve detection method for finding object boundaries in natural scenes and measured its performance using the Berkeley segmentation dataset.
Field-Programmable Gate Arrays (FPGAs) have become a mainstay in the digital electronics world both for the ease of implementation as well as their inherent usefulness in incrementally refining hardware designs. When ...
详细信息
Field-Programmable Gate Arrays (FPGAs) have become a mainstay in the digital electronics world both for the ease of implementation as well as their inherent usefulness in incrementally refining hardware designs. When moving to an Application Specific Integrated Circuit (ASIC) or System on a Chip (SoC), verification becomes a very time consuming process, with virtually no room for error. As a result, a variety of methods have been devised to decrease the risk when creating an ASIC or SoC. We describe a hardware and software framework for testing real-time vision algorithms for lowering the uncertainty in FPGA and SoC development, while reducing the SoC verification time. The framework benefits from hardware and software verification, ease of reconfiguration for testing multiple vision algorithms, and an iterative hardware/ software co-design.
This paper presents a method to detect and construct a 3D geometric model of an urban area with complex buildings using aerial LIDAR (Light Detection and Ranging) data. The LIDAR data collected from a nadir direction ...
详细信息
This paper presents a method to detect and construct a 3D geometric model of an urban area with complex buildings using aerial LIDAR (Light Detection and Ranging) data. The LIDAR data collected from a nadir direction is a point cloud containing surface samples of not only the building roofs and terrain but also undesirable clutter from trees, cars, etc. The main contribution of this work is the automatic recognition and estimation of simple parametric shapes that can be combined to model very complex buildings from aerial LIDAR data. The main components of the detection and modeling algorithms are (i) Segmentation of roof and terrain points. (ii) Roof topology Inference. We introduce the concept of a roof-topology graph to represent the relationships between the various planar patches of a complex roof structure. (iii) Parametric roof composition. Simple parametric roof shapes that can be combined to create a complex roof structure of a building are recognized by searching for sub-graphs in its roof-topology graph. (iv) Terrain Modeling. The terrain is identified and modeled as a triangulated mesh. Finally, we provide experimental results that demonstrate the validity of our approach for rapid and automatic building detection and geometric modeling with real LIDAR data. We are able to model cities and other urban areas at the rate of about 10 minutes per sq. mile on a low-end PC.
We introduce a technique to automatically correct color inconsistencies in a display composed of one or more digital light projectors (DLP). The method is agnostic to the source of error and can detect and address col...
详细信息
We introduce a technique to automatically correct color inconsistencies in a display composed of one or more digital light projectors (DLP). The method is agnostic to the source of error and can detect and address color problems from a number of sources. Examples include inter- and intra-projector color differences, display surface markings, and environmental lighting differences on the display. In contrast to methods that discover and map all colors into the greatest common color space, we minimize local color discontinuities to create color seamlessness while remaining tolerant to significant color error. The technique makes use of a commodity camera and high-dynamic range sensing to measure color gamuts at many different spatial locations. A differentiable energy function is defined that combines both a smoothness and data term. This energy function is globally minimized through the successive application of projective warps defined using gradient descent. At convergence the warps can be applied at runtime to minimize color defects in the display. The framework is demonstrated on displays that suffer from several sources of color
We present a novel boundary-based (discontinuity tracking) hierarchical statistical criterion to address the interactive contour extraction problem. Our criterion relies on a Markov Chain representation of the boundar...
详细信息
We present a novel boundary-based (discontinuity tracking) hierarchical statistical criterion to address the interactive contour extraction problem. Our criterion relies on a Markov Chain representation of the boundary and can be efficiently optimized using Dijkstra’s algorithm for solving the shortest paths problem. Unlike other criteria optimized with Dijkstra’s algorithm, ours is capable of extracting geometrically complex boundaries even when the features incorporated in the objective function are based only on user markings on a small part of the image. The critical quantity in our criterion that yields the above-mentioned results is a normalization factor that boosts the probability of a particular boundary segment based on the candidate boundary segments in its vicinity. Although similar in spirit to the technique of non-maximum suppression routinely employed in edge detection, our method boosts gradually the probability of a particular segment given its surroundings using windows of increasing size in a hierarchical fashion.
The performance of many computervision and machine learning algorithms critically depends on the quality of the similarity measure defined over the feature space. Previous works usually utilize metric distances which...
详细信息
ISBN:
(纸本)0769525970
The performance of many computervision and machine learning algorithms critically depends on the quality of the similarity measure defined over the feature space. Previous works usually utilize metric distances which are ofen epistemologically different from the perceptual distance of human beings. In this paper a novel non-metric partial similarity measure is introduced, which is born to automatically capture the prominent partial similarity between two images while ignoring the confusing unimportant dissimilarity. This measure is potentially useful in face recognition since it can help identify the inherent intra-personal similarity and thus reducing the influence caused by large variations such as expression and occlusions. Moreover; to make this method practical, this paper proposes an automatic and class-dependent similarity threshold setting mechanism based on the maximal margin criterion, and uses a Self- Organization Map-based embedding technique to alleviate the computational problem. Experimental results show the feasibility and effectiveness of the proposed method.
We present a novel method to obtain a 3D Euclidean reconstruction of both the background and moving objects in a video sequence. We assume that, multiple objects are moving rigidly on a ground plane observed by a movi...
详细信息
We present a novel method to obtain a 3D Euclidean reconstruction of both the background and moving objects in a video sequence. We assume that, multiple objects are moving rigidly on a ground plane observed by a moving camera. The video sequence is first segmented into static background and motion blobs by a homography-based motion segmentation method. Then classical "Structure from Motion" (SfM) techniques are applied to obtain a Euclidean reconstruction of the static background. The motion blob corresponding to each moving object is treated as if there were a static object observed by a hypothetical moving camera, called a "virtual camera". This virtual camera shares the same intrinsic parameters with the real camera but moves differently due to object motion. The same SfM techniques are applied to estimate the 3D shape of each moving object and the pose of the virtual camera. We show that the unknown scale of moving objects can be approximately determined by the ground plane, which is a key contribution of this paper. Another key contribution is that we prove that the 3D motion of moving objects can be solved from the virtual camera motion with a linear constraint imposed on the object translation. In our approach, a planartranslation constraint is formulated: "the 3D instantaneous translation of moving objects must be parallel to the ground plane". Results on real-world video sequences demonstrate the effectiveness and robustness of our approach.
Scene relighting has found applications in many areas, from movie special effects to building immersive environments. In this paper, we present a framework to render a scene under any prescribed lighting environments....
详细信息
Scene relighting has found applications in many areas, from movie special effects to building immersive environments. In this paper, we present a framework to render a scene under any prescribed lighting environments. First, we propose a 3D scanner that captures the scene geometry and reflectance simultaneously, by effectively using an uncalibrated camera, and light and shadows projected from a controlled lighting plane onto the scene. Then, a large camera array, composed of 48 cameras forming a planar array, is built to capture a lighting environment as a 4D incident light field. With the scene properties captured by the 3D scanner, and the lighting environment captured by the camera array, we can then relight the scene using a technique we develop, called the adaptive environment map (AEM). AEM, which fully describes the lighting environment including both spatial and directional variations, can render more realistic relighting results than existing techniques in literature, as shown in extensive experiments.
暂无评论