This paper gives an algorithm for detecting and reading text in natural images. The algorithm is intended for use by blind and visually impaired subjects walking through city scenes. We first obtain a dataset of city ...
详细信息
This paper gives an algorithm for detecting and reading text in natural images. The algorithm is intended for use by blind and visually impaired subjects walking through city scenes. We first obtain a dataset of city images taken by blind and normally sighted subjects. From this dataset, we manually label and extract the text regions. Next we perform statistical analysis of the text regions to determine which image features are reliable indicators of text and have low entropy (i.e. feature response is similar for all text images). We obtain weak classifiers by using joint probabilities for feature responses on and off text. These weak classifiers are used as input to an AdaBoost machine learning algorithm to train a strong classifier. In practice, we trained a cascade with 4 strong classifiers containg 79 features. An adaptive binarization and extension algorithm is applied to those regions selected by the cascade classifier. A commercial OCR software is used to read the text or reject it as a non-text region. The overall algorithm has a success rate of over 90% (evaluated by complete detection and reading of the text) on the test set and the unread text is typically small and distant from the viewer.
A fast and efficient multiple layer background maintenance model is built to conserve the original and the current background separately. Fusing the properties of object motion in image pixels and the changes between ...
详细信息
Color as a distinct feature is widely used for object representation and tracking. However, color-based tracking is often influenced by clutter background and illumination variation. This paper presents a robust color...
详细信息
The proceedings contains 68 papers from the 1st Canadian conference on computer and Robot Vision 2004. The topics discussed include: visual tracking using adaptive appearance models;the extension of statistical face d...
详细信息
ISBN:
(纸本)0769521274
The proceedings contains 68 papers from the 1st Canadian conference on computer and Robot Vision 2004. The topics discussed include: visual tracking using adaptive appearance models;the extension of statistical face detection to face tracking;an optical-intertial tracking system for fully-enclosed VR displays;real-time motion tracker for a robotic vision system;Bayesian segmentation supported by neighborhood configurations and unsupervised segmentation of synthetic aperture radar sea ice imagery using MRF models.
Choosing unique and invariant features is the first important step in object tracking. In this paper, we present a method to find proper-sized and irregularly-shaped trackable features, the use of which can outperform...
详细信息
ISBN:
(纸本)0769521274
Choosing unique and invariant features is the first important step in object tracking. In this paper, we present a method to find proper-sized and irregularly-shaped trackable features, the use of which can outperform procedures using normal square features. The notion of confidence associated with each feature is introduced as the feature propagates. The use of confidence results in robust tracking even when occlusion is present. Based on the translational displacement of each feature, the affine motion of the object can be accurately estimated. This approach has been tested on a wide variety of video sequences and produces good tracking results.
In this paper, we develop a new video-to-video face recognition algorithm. The major advantage of the video based method is that more information is available in a video sequence than in a single image. In order to ta...
详细信息
In this paper, we develop a new video-to-video face recognition algorithm. The major advantage of the video based method is that more information is available in a video sequence than in a single image. In order to take advantage of the large amount of information in the video sequence and at the same time overcome the processing.speed and data size problems we develop several new techniques including temporal and spatial frame synchronization and multi-level subspace analysis for video cube processing. The method preserves all the spatial-temporal information contained in a video sequence. Near perfect classification results are obtained on the XM2VTS face video database.
Numerical methods associated with graph-theoretic imageprocessing.algorithms often reduce to the solution of a large linear system. We show here that choosing a topology that yields a small graph diameter can greatly...
详细信息
Numerical methods associated with graph-theoretic imageprocessing.algorithms often reduce to the solution of a large linear system. We show here that choosing a topology that yields a small graph diameter can greatly speed up the numerical solution. As a proof of concept, we examine two image graphs that preserve local connectivity of the nodes (pixels) while drastically reducing the graph diameter. The first is based on a "small-world" modification of a standard 4-connected lattice. The second is based on a quadtree graph. Using a recently described graph- theoretic imageprocessing.algorithm we show that large speed-up is achieved with a minimal perturbation of the solution when these graph topologies are utilized. We suggest that a variety of similar algorithms may also benefit from this approach.
Photometric methods in computer vision require calibration of the camera's radiometric response, and previous works have addressed this problem using multiple registered images captured under different camera expo...
详细信息
Photometric methods in computer vision require calibration of the camera's radiometric response, and previous works have addressed this problem using multiple registered images captured under different camera exposure settings. In many instances, such an image set is not available, so we propose a method that performs radiometric calibration from only a single image, based on measured RGB distributions at color edges. This technique automatically selects appropriate edge information for processing. and employs a Bayesian approach to compute the calibration. Extensive experimentation has shown that accurate calibration results can be obtained using only a single input image.
Graphical models are powerful tools for processing.images. However, the large dimensionality of even local image data poses a difficulty. Representing the range of possible graphical model node variables with discrete...
详细信息
Graphical models are powerful tools for processing.images. However, the large dimensionality of even local image data poses a difficulty. Representing the range of possible graphical model node variables with discrete states leads to an overwhelmingly large number of states for the model, often making both exact and approximate inference computationally intractable. We propose a representation that allows a small number of discrete states to represent the large number of possible image values at each pixel or local image patch. Each node in the graph represents the best regression function, chosen from a set of candidate functions, for estimating the unobserved image pixels from the observed samples. This permits a small number of discrete states to summarize the range of possible image values at each point in the image. Belief propagation is then used to find the best regressor to use at each point. To demonstrate the usefulness of this technique, we apply it to two problems: super-resolution and color demosaicing. In both cases, we find our method compares well against other techniques for these problems.
Motivated by the success of parts based representations in face detection we have attempted to address some of the problems associated with applying such a philosophy to the task of face verification. Hitherto, a majo...
详细信息
Motivated by the success of parts based representations in face detection we have attempted to address some of the problems associated with applying such a philosophy to the task of face verification. Hitherto, a major problem with this approach in face verification is the intrinsic lack of training observations, stemming from individual subjects, in order to estimate the required conditional distributions. The estimated distributions have to be generalized enough to encompass the differing permutations of a subject's face yet still be able to discriminate between subjects. In our work the well known Gaussian mixture model (GMM) framework is employed to model the conditional density function of the parts based representation of the face. We demonstrate that excellent performance can be obtained from our GMM based representation through the employment of adaptation theory, specifically relevance adaptation (RA). Our results are presented for the frontal images of the BANCA database.
暂无评论