In this paper we describe a novel generative model for video analysis called the transformed hidden Markov model (THMM). The video sequence is modeled as a set of frames generated by transforming a small number of cla...
详细信息
In this paper we describe a novel generative model for video analysis called the transformed hidden Markov model (THMM). The video sequence is modeled as a set of frames generated by transforming a small number of class images that summarize the sequence. For each frame, the transformation and the class are discrete latent variables that depend on the previous class and transformation in the sequence. The set of possible transformations is defined in advance, and it can include a variety of transformation such as translation, rotation and shearing. In each stage of such a Markov model, a new frame is generated from a transformed Gaussian distribution based on the class/transformation combination generated by the Markov chain. This model can be viewed as an extension of a transformed mixture of Gaussians [1] through time. We use this model to cluster unlabeled video segments and form a video summary in an unsupervised fashion. We also use the trained models to perform tracking, image stabilization and filtering. We demonstrate that the THMM is capable of combining long term dependencies in video sequences (repeating similar frames in remote parts of the sequence) with short term dependencies (such as short term image frame similarities and motion patterns) to better summarize and process a video sequence even in the presence of high levels of white or structured noise (such as foreground occlusion).
In radiation therapy digital images taken both from the simulator and the verification system of the treatment unit, in respect to the planned and irradiated field, are essential in the patient's treatment, The co...
详细信息
ISBN:
(纸本)078036466X
In radiation therapy digital images taken both from the simulator and the verification system of the treatment unit, in respect to the planned and irradiated field, are essential in the patient's treatment, The correction of the distortion in these image modalities Is a prerequisite for the quantification of the treatment set-up accuracy The method proposed here, uses a well defined rectangular point grid at a specific distance from the focus between the object and the imaging chain, We have used graph theory model for the reconstruction of the whole grid due to the lack of information caused of the overlapping structures. The bet also that in most of the cases tie irradiation fields in radiotherapy are bigger than the field of view of an image intensifier, makes necessary the acquisition of several segments of the planned field and their combination into a single image using patternrecognition techniques, The dependency of this distortion in respect to the distance of the image intensifier from the focus of the x-ray beam and to the angle of the simulator's gentry is demonstrated. The application of the proposed method has resulted into an 80% reduction of the distortion to 1.5 mm at the edges of the image intensifier. The performance of the model could he improved even more by considering both a radial and tangential distortion factor.
This paper addresses the derivation of likelihood functions and confidence bounds for problems involving over-determined linear systems with noise in all measurements, often referred to as total-least-squares (TLS). I...
详细信息
This paper addresses the derivation of likelihood functions and confidence bounds for problems involving over-determined linear systems with noise in all measurements, often referred to as total-least-squares (TLS). It has been shown previously that TLS provides maximum likelihood estimates. But rather than being a function solely of the variables of interest, the associated likelihood functions increase in dimensionality with the number of equations. This has made it difficult to derive suitable confidence bounds, and impractical to use these probability functions with Bayesian belief propagation or Bayesian tracking. This paper derives likelihood functions that are defined only on the parameters of interest. This has two main advantages: first, the likelihood functions are much easier to use within a Bayesian framework;and second it is straightforward to obtain a reliable confidence bound on the estimates. We demonstrate the accuracy of our confidence bound in relation to others that have been proposed. Also, we use our theoretical results to obtain likelihood functions for estimating the direction of 3d camera translation.
This paper proposes a novel technique to computing geometric information from images captured under parallel projections. Parallel images are desirable for stereo reconstruction because parallel projection significant...
详细信息
This paper proposes a novel technique to computing geometric information from images captured under parallel projections. Parallel images are desirable for stereo reconstruction because parallel projection significantly reduces foreshortening. As a result, correlation based matching becomes more effective. Since parallel projection cameras are not commonly available, we construct parallel images by rebinning a large sequence of perspective images. Epipolar geometry, depth recovery and projective invariant for both 1D and 2D parallel stereos are studied. From the uncertainty analysis of depth reconstruction, it is shown that parallel stereo is superior to both conventional perspective stereo and the recently developed multiperspective stereo for vision reconstruction, in that uniform reconstruction error is obtained in parallel stereo. Traditional stereo reconstruction techniques, e.g. multi-baseline stereo, can still be applicable to parallel stereo without any modifications because epipolar lines in a parallel stereo are perfectly straight. Experimental results further confirm the performance of our approach.
Many visual matching algorithms can be described in terms of the features and the inter-feature distance or metric. The most commonly used metric is the sum of squared differences (SSD), which is valid from a maximum ...
详细信息
Many visual matching algorithms can be described in terms of the features and the inter-feature distance or metric. The most commonly used metric is the sum of squared differences (SSD), which is valid from a maximum likelihood perspective when the real noise distribution is Gaussian. Based on real noise distributions measured from international test sets, we have found experimentally that the Gaussian noise distribution assumption is often invalid. This implies that other metrics, which have distributions closer to the real noise distribution, should be used. In this paper we considered two different visual matching applications: content-based retrieval in image databases and stereo matching. Towards broadening the results, we also implemented several sophisticated algorithms from the research literature. In each algorithm we compared the efficacy of the SSD metric, the SAD (sum of the absolute differences) metric, the Cauchy metric, and the Kullback relative information. Furthermore, in the case where sufficient training data is available, we discussed and experimentally tested a new metric based directly on the real noise distribution, which we denoted the maximum likelihood metric.
This paper investigates the use of colour and texture cues for segmentation of images within two specified domains. The first is the Sowerby dataset, which contains one hundred colour photographs of country roads in E...
详细信息
This paper investigates the use of colour and texture cues for segmentation of images within two specified domains. The first is the Sowerby dataset, which contains one hundred colour photographs of country roads in England that have been interactively segmented and classified into six classes - edge, vegetation, air, road, building, and other. The second domain is a set of thirty five images, taken in San Francisco, which have been interactively segmented into similar classes. In each domain we learn the joint probability distributions of filter responses, based on colour and texture, for each class. These distributions are then used for classification. We restrict ourselves to a limited number of filters in order to ensure that the learnt filter responses do not overfit the training data (our region classes are chosen so as to ensure that there is enough data to avoid overfitting). We do performance analysis on the two datasets by evaluating the false positive and false negative error rates for the classification. This shows that the learnt models achieve high accuracy in classifying individual pixels into those classes for which the filter responses are approximately spatially homogeneous (i.e. road, vegetation, and air but not edge and building). A more sensitive performance measure, the Chernoff information, is calculated in order to quantify how well the cues for edge and building are doing. This demonstrates that statistical knowledge of the domain is a powerful tool for segmentation.
While several image-based rendering techniques have been proposed to successfully render scenes/objects from a large collection (e.g., thousands) of images without explicitly recovering 3D structures, the minimum numb...
详细信息
While several image-based rendering techniques have been proposed to successfully render scenes/objects from a large collection (e.g., thousands) of images without explicitly recovering 3D structures, the minimum number of images needed to achieve a satisfactory rendering result remains an open problem. This paper is the first attempt to investigate the lower bound for the number of samples needed in the Lumigraph/light field rendering. To simplify the analysis, we consider an ideal scene with only a point that is between a minimum and a maximum range. Furthermore, constant-depth assumption and bilinear interpolation are used for rendering. The constant-depth assumption serves to choose 'nearby' rays for interpolation. Our criterion to determine the lower bound is to avoid horizontal and vertical double images, which are caused by interpolation using multiple nearby rays. This criterion is based on the causality requirement is scale-space theory, i.e., no 'spurious details' should be generated while smoothing. Using this criterion, closed-form solutions of lower bounds are obtained for both 3D plenoptic function (Concentric Mosaics) and 4D plenoptic function (light field). The bounds are derived completely from the aspect of geometry and are closely related to the resolution of the camera and the depth range of the scene. These lower bounds are further verified by our experimental results.
Motion estimation in image sequences is an important step in many computer vision and imageprocessing.applications. Several methods for solving this problem have been proposed, but very few manage to achieve a high l...
详细信息
This paper introduces a method to calibrate a wide area system of unsynchronized cameras with respect to a single global coordinate system. The method is simple and does not require the physical construction of a larg...
详细信息
This paper introduces a method to calibrate a wide area system of unsynchronized cameras with respect to a single global coordinate system. The method is simple and does not require the physical construction of a large calibration object. The user need only wave an identifiable point in front of all cameras. The method generates a rough estimate of camera pose by first performing pair-wise structure-from-motion on observed points, and then combining the pair-wise registrations into a single coordinate frame. Using the initial camera pose, the moving point can be tracked in world space. The path of the point defines a 'virtual calibration object' which can be used to improve the initial estimates of camera pose. Iterating the above process yields a more precise estimate of both camera pose and the point path. Experimental results show that it performs as well as calibration from a physical target, in cases where all cameras share some common working volume. We then demonstrate its effectiveness in wide area settings by calibration a system of cameras in a configuration where traditional methods cannot be applied directly.
This paper proposes a new expert system IMPRESS-Pro that automatically constructs an imageprocessing.procedure to decide whether there exists a specific figure in an image or not based on the requirement of misclassi...
详细信息
暂无评论