Videotext recognition is challenging due to low resolution, diverse fonts/styles, and cluttered background. Past methods enhanced recognition by using multiple frame averaging, image interpolation and lexicon correcti...
详细信息
ISBN:
(纸本)0769519008
Videotext recognition is challenging due to low resolution, diverse fonts/styles, and cluttered background. Past methods enhanced recognition by using multiple frame averaging, image interpolation and lexicon correction, but recognition using multi-modality language models has not been explored. In this paper, we present a formal Bayesian framework for videotext recognition by combining multiple knowledge using mixture models, and describe a learning approach based on Expectation-Maximization (EM). In order to handle unseen words, a back-off smoothing approach derived from the Bayesian model is also presented. We exploited a prototype that fuses the model from closed caption and that from the British National Corpus. The model from closed caption is based on a unique time distance distribution model of videotext words and closed caption words. Our method achieves a significant performance gain, with word recognition rate of 76.8% and character recognition rate of 86.7%. The proposed methods also reduce false videotext detection significantly, with a false alarm rate of 8.2% without substantial loss of recall.
We introduce a new method to describe shape relationships over time in a photograph. We acquire both range and image information in a sequence of frames using a stationary stereo camera. From the pictures taken, we co...
详细信息
We introduce a new method to describe shape relationships over time in a photograph. We acquire both range and image information in a sequence of frames using a stationary stereo camera. From the pictures taken, we compute a composite image consisting of the pixels from the surfaces closest to the camera over all the time frames. Through occlusion cues, this composite reveals 3-D relationships between the shapes at different times. We call the composite a shape-time photograph. Small errors in stereo depth measurements can create artifacts in the shape-time images. We correct most of these using a Markov network to estimate the most probable front-surface pixel, taking into account (a) the stereo depth measurements and their uncertainties, and (b) spatial continuity assumptions for the time-frame assignments of the front-surface pixels.
The aim of this paper is to find the best representation for the appearance of surfaces with Lambertian reflectance under varying illumination. Previous work using principal component analysis (PCA) found the best sub...
详细信息
The aim of this paper is to find the best representation for the appearance of surfaces with Lambertian reflectance under varying illumination. Previous work using principal component analysis (PCA) found the best sub-space to represent all images of an object under a varying point light source. We extend this to images from any illumination distribution. Specifically we calculate the bases for all configurations of a point plus ambient light source and two point light sources, as well as from a database of captured real world illumination. We also reformulate the optimization criterion used in PCA. The resulting basis, we believe has higher representability and is better for analyzing images of shaded objects. The different bases are compared on a database of images to test the representability.
Motion blur due to camera motion can significantly degrade the quality of an image. Since the path of the camera motion can be arbitrary, deblurring of motion blurred images is a hard problem. Previous methods to deal...
详细信息
Motion blur due to camera motion can significantly degrade the quality of an image. Since the path of the camera motion can be arbitrary, deblurring of motion blurred images is a hard problem. Previous methods to deal with this problem have included blind restoration of motion blurred images, optical correction using stabilized lenses, and special CMOS sensors that limit the exposure time in the presence of motion. In this paper, we exploit the fundamental tradeoff between spatial resolution and temporal resolution to construct a hybrid camera that can measure its own motion during image integration. The acquired motion information is used to compute a point spread function (PSF) that represents the path of the camera during integration. This PSF is then used to deblur the image. To verify the feasibility of hybrid imaging for motion deblurring, we have implemented a prototype hybrid camera. This prototype system was evaluated in different indoor and outdoor scenes using long exposures and complex camera motion paths. The results show that, with minimal resources, hybrid imaging outperforms previous approaches to the motion blur problem.
We question the role that large scale filter banks have traditionally played in texture classification. It is demonstrated that textures can be classified using the joint distribution of intensity values over extremel...
详细信息
We question the role that large scale filter banks have traditionally played in texture classification. It is demonstrated that textures can be classified using the joint distribution of intensity values over extremely compact neighborhoods (starting from as small as 3 /spl times/ 3 pixels square), and that this outperforms classification using filter banks with large support. We develop a novel texton based representation, which is suited to modeling this joint neighborhood distribution for MRFs. The representation is learnt from training images, and then used to classify novel images (with unknown viewpoint and lighting) into texture classes. The power of the method is demonstrated by classifying over 2800 images of all 61 textures present in the Columbia-Utrecht database. The classification performance surpasses that of recent state-of-the-art filter bank based classifiers such as Leung & Malik, Cula & Dana, and Varma & Zisserman.
Scene content understanding facilitates a large number of applications, ranging from content-based image retrieval to other multimedia applications. Material detection refers to the problem of identifying key semantic...
详细信息
Scene content understanding facilitates a large number of applications, ranging from content-based image retrieval to other multimedia applications. Material detection refers to the problem of identifying key semantic material types (such as sky, grass, foliage, water, and snow in images). In this paper, we present a holistic approach to determining scene content, based on a set of individual material detection algorithms, as well as probabilistic spatial context models. A major limitation of individual material detectors is the significant number of misclassifications that occur because of the similarities in color and texture characteristics of various material types. We have developed a spatial context-aware material detection system that reduces misclassification by constraining the beliefs to conform to the probabilistic spatial context models. Experimental results show that the accuracy of materials detection is improved by 13% using the spatial context models over the individual material detectors themselves.
We present a fast, robust and automatic method for computing central paths through tubular structures for application to virtual endoscopy. The key idea is to utilize a medial surface algorithm, which exploits propert...
详细信息
We present a fast, robust and automatic method for computing central paths through tubular structures for application to virtual endoscopy. The key idea is to utilize a medial surface algorithm, which exploits properties of the average outward flux of the gradient vector field of a Euclidean distance function the boundary of the structure of interest. The algorithm is modified to yield a collection of 3D curves, each of which is locally centered. The approach requires no user interaction, and is virtually parameter free and has low computational complexity. We illustrate the approach on segmented colon and vessel data.
E-learning has received more and more attention in recent years. The abundant text information in E-learning videos is very valuable for information indexing, searching and other applications. In order to effectively ...
详细信息
E-learning has received more and more attention in recent years. The abundant text information in E-learning videos is very valuable for information indexing, searching and other applications. In order to effectively extract the text from E-learning videos, a text processing.method is proposed in this paper. The method is composed of two parts: text change frame detection and text extraction from image. The purpose of text change frame detection is to remove the redundant frames from the video and reduce the total processing.time. A new text extraction algorithm is proposed to extract the text areas in the text change frames for further recognition. Experiments on lecture video manifest the good performance of our method.
We introduce a face representation, the shape trace transform (STT), for recognizing faces in an authentication system. The STT offers an alternative representation for faces that has a very high discriminatory power....
详细信息
We introduce a face representation, the shape trace transform (STT), for recognizing faces in an authentication system. The STT offers an alternative representation for faces that has a very high discriminatory power. We estimate the dissimilarity between two shapes by a new measure. We propose the Hausdorff context. Reinforcement learning is used to search the optimal parameters of the algorithm, for which the within class variance of the STT is minimized. This research demonstrates that the proposed method provides a new way for face representation. Our system is verified with experiments on the XM2VTS database.
We introduce the problem of repetitive nearest neighbor search in relevance feedback and propose an efficient search scheme for high dimensional feature spaces. Relevance feedback learning is a popular scheme used in ...
详细信息
We introduce the problem of repetitive nearest neighbor search in relevance feedback and propose an efficient search scheme for high dimensional feature spaces. Relevance feedback learning is a popular scheme used in content based image and video retrieval to support high-level concept queries. The paper addresses those scenarios in which a similarity or distance matrix is updated during each iteration of the relevance feedback search and a new set of nearest neighbors is computed. This repetitive nearest neighbor computation in high dimensional feature spaces is expensive, particularly when the number of items in the data set is large. In this context, we suggest a search algorithm that supports relevance feedback for the general quadratic distance metric. The scheme exploits correlations between two consecutive nearest neighbor sets thus significantly reducing the overall search complexity. Detailed experimental results are provided using 60 dimensional texture feature dataset.
暂无评论