This paper presents two approaches for the representation and recognition of human action in video, aiming for view-point invariance. The paper first presents new results using a 2D approach presented earlier. Inheren...
详细信息
This paper presents two approaches for the representation and recognition of human action in video, aiming for view-point invariance. The paper first presents new results using a 2D approach presented earlier. Inherent limitations of the 2D approach are discussed and a new 3D approach that builds on recent work on 3D model-based invariants, is presented. Each action is represented as a unique curve in a 3D invariance space, surrounded by an acceptance volume ('action volume'). Given a video sequence, 2D quantities from the image are calculated and matched against candidate action volumes in a probabilistic framework. The theory is presented followed by results on arbitrary projections of motion-capture data which demonstrate a high degree of tolerance to viewpoint change.
This article describes visual functions dedicated to the extraction and recognition of planar quadrangles detected from a single camera. Extraction is based on a relaxation scheme with constraints between image segmen...
详细信息
This article describes visual functions dedicated to the extraction and recognition of planar quadrangles detected from a single camera. Extraction is based on a relaxation scheme with constraints between image segments, while the characterization we propose allows recognition to be achieved from different view-points and viewing conditions. We defined and evaluated several metrics on this representation space - a correlation-based one and another one based on sets of interest points.
The performance of image retrieval with SVM active learning is known to be poor when started with few labeled images only. In this paper, the problem is solved by incorporating the unlabelled images into the bootstrap...
详细信息
ISBN:
(纸本)0769519008
The performance of image retrieval with SVM active learning is known to be poor when started with few labeled images only. In this paper, the problem is solved by incorporating the unlabelled images into the bootstrapping of the learning process. In this work, the initial SVM classifier is trained with the few labeled images and the unlabelled images randomly selected from the image database. Both theoretical analysis and experimental results show that by incorporating unlabelled images in the bootstrapping, the efficiency of SVM active learning can be improved, and thus improves the overall retrieval performance.
This paper presents a representation for three-dimensional objects in terms of affine-invariant image patches and their spatial relationships. Multi-view constraints associated with groups of patches are combined with...
详细信息
This paper presents a representation for three-dimensional objects in terms of affine-invariant image patches and their spatial relationships. Multi-view constraints associated with groups of patches are combined with a normalized representation of their appearance to guide matching and reconstruction, allowing the acquisition of true three-dimensional affine and Euclidean models from multiple images and their recognition in a single photograph taken from an arbitrary viewpoint. The proposed approach does not require a separate segmentation stage and is applicable to cluttered scenes. Preliminary modeling and recognition results are presented.
This paper describes the implementation of a stereo depth measurement algorithm in hardware on field programmable gate arrays (FPGAs). This system generates 8 bit sub-pixel disparities on 256 by 360 pixel images at vi...
详细信息
This paper describes the implementation of a stereo depth measurement algorithm in hardware on field programmable gate arrays (FPGAs). This system generates 8 bit sub-pixel disparities on 256 by 360 pixel images at video rate (30 frames/sec). The algorithm implemented is a multi-resolution, multi-orientation phase-based technique called local weighted phase-correlation (Fleet, 1994). Hardware implementation speeds up the performance more than 300 times that of the same algorithm running in software. In this paper, we describe the programmable hardware platform, the base stereo vision algorithm and the design of the hardware. We include various trade-offs required to make the hardware small enough to fit on our system and fast enough to work at video rate. We also show sample outputs from the functioning hardware. Although this paper is specifically focused on phase-based stereo vision FPGA realizations, most of the design issues are common to other DSP and vision applications.
In low-level vision, the representation of scene properties such as shape, albedo, etc., are very high dimensional as they have to describe complicated structures. The approach proposed here is to let the image itself...
详细信息
In low-level vision, the representation of scene properties such as shape, albedo, etc., are very high dimensional as they have to describe complicated structures. The approach proposed here is to let the image itself bear as much of the representational burden as possible. In many situations, scene and image are closely related and it is possible to find a functional relationship between them. The scene information can be represented in reference to the image where the functional specifies how to translate the image into the associated scene. We illustrate the use of this representation for encoding shape information. We show how this representation has appealing properties such as locality and slow variation across space and scale. These properties provide a way of improving shape estimates coming from other sources of information like stereo.
This paper proposes an algorithm to clean up a large collection of historical handwritten documents kept up in the National Archives of Singapore. Due to the seepage of ink over long period of storage, the front page ...
详细信息
ISBN:
(纸本)0769519008
This paper proposes an algorithm to clean up a large collection of historical handwritten documents kept up in the National Archives of Singapore. Due to the seepage of ink over long period of storage, the front page of each document has been severely marred by the reverse side writing. Earlier attempts have been made to match both sides of the page to identify the offending strokes originating from the back so as to eliminate them with the aid of a wavelet transform. Perfect matching, however, is difficult due to document skews, differing resolutions, inadvertently missing out reverse side and warped pages during image capture. An approach is now proposed to do away with double side mapping by using a directional wavelet transform that is able to distinguish the foreground and reverse side strokes much better than the conventional wavelet transform. Experiments have shown that the method indeed enhances the readability of each document significantly over after the directional wavelet operation without the need for mapping with its reverse side.
Virtually all methods in imageprocessing.and computer vision, for removing weather effects from images, assume single scattering of light by particles in the atmosphere. In reality, multiple scattering effects are si...
详细信息
Virtually all methods in imageprocessing.and computer vision, for removing weather effects from images, assume single scattering of light by particles in the atmosphere. In reality, multiple scattering effects are significant. A common manifestation of multiple scattering is the appearance of glows around light sources in bad weather. Modeling multiple scattering is critical to understanding the complex effects of weather on images, and hence essential for improving the performance of outdoor vision systems. We develop a new physics-based model for the multiple scattering of light rays as they travel from a source to an observer. This model is valid for various weather conditions including fog, haze, mist and rain. Our model enables us to recover from a single image the shapes and depths of sources in the scene. In addition, the weather condition and the visibility of the atmosphere can be estimated. These quantities can, in turn, be used to remove the glows of sources to obtain a clear picture of the scene. Based on these results, we demonstrate that a camera observing a distant source can serve as a "visual weather meter". The model and techniques described in this paper can also be used to analyze scattering in other media, such as fluids and tissues. Therefore, in addition to vision in bad weather, our work has implications for medical and underwater imaging.
Corner measurement is of main concern within the following tasks: camera calibration, image matching, object tracking, recognition and reconstruction. This paper presents a hybrid evolutionary ridge regression approac...
详细信息
Corner measurement is of main concern within the following tasks: camera calibration, image matching, object tracking, recognition and reconstruction. This paper presents a hybrid evolutionary ridge regression approach for the problem of corner modeling. We search model parameters characterizing L-corner models by means of fitting the model to the image data. As the model fitting relies on an initial parameter estimation, we use a global approach to find the global minimum. Experimental results applied to an L-corner using several levels of noise show the advantages and disadvantages of our evolutionary algorithm compared to down-hill simplex and simulated annealing.
The task of segmenting the posterior ribs within the lung fields is of great practical importance. For example, delineation of the ribs may lead to a decreased number of false positives in computerized detection of ab...
详细信息
ISBN:
(纸本)0819448338
The task of segmenting the posterior ribs within the lung fields is of great practical importance. For example, delineation of the ribs may lead to a decreased number of false positives in computerized detection of abnormalities, and hence analysis of radiographs for computer-aided diagnosis purposes will benefit from this. We use an iterative, pixel-based, statistical classification method-iterated contextual pixel. classification (ICPC). It is suited for a complex segmentation task in which a global shape description is hard to provide. The method combines local gray level and contextual information to come to an overall image segmentation. Because of it generality, it is also, useful for other segmentation tasks. In our case, the variable number of visible ribs in the lung fields complicates the use of a global model. Additional difficulties arise from the poor visibility of the lower and medial ribs. Using cross validation, the method is evaluated on 35 radiographs in which all posterior ribs were traced manually. ICPC obtains an accuracy of 83%, a sensitivity of 79%, and a specificity of 86% for segmenting the costal space. Further evaluation is done using five manual segmentations from a second observer, whose performance is compared with the five corresponding images from the first manual segmentation, yielding 83% accuracy, 84% sensitivity, and 83% specificity. On these five images, ICPC attains 82%, 78%, and 86% respectively.
暂无评论