This correspondence describes an approach to reducing the computational cost of document image decoding by viewing it as a heuristic search problem. The kernel of the approach is a modified dynamic programming (DP) al...
详细信息
This correspondence describes an approach to reducing the computational cost of document image decoding by viewing it as a heuristic search problem. The kernel of the approach is a modified dynamic programming (DP) algorithm, called the iterated complete path (ICP) algorithm, that is intended for use with separable source models. A set of heuristic functions are presented for decoding formatted text with ICP. Speedups of 3-25 over DP have been observed when decoding text columns and telephone yellow pages using ICP and the proposed heuristics.
The computation time of document image decoding can be significantly reduced by employing heuristics in the search for the best decoding of a text line. By using a cheap upper bound on template match scores, up to 99....
详细信息
ISBN:
(纸本)0819439851
The computation time of document image decoding can be significantly reduced by employing heuristics in the search for the best decoding of a text line. By using a cheap upper bound on template match scores, up to 99.9% of the potential template matches can be avoided. In the Iterated Complete Path method, template matches are performed only along the best path found by dynamic programming on each iteration. When the best path stabilizes, the decoding is optimal and no more template matches need be performed. Computation can be further reduced in this scheme by exploiting the incremental nature of the Viterbi iterations. Because only a few trellis edge weights have changed since the last iteration, most of the backpointers do not need to be updated. We describe how to quickly identify these backpointers, without forfeiting optimality of the path. Together these improvements provide a 30x speedup over previous implementations of document image decoding.
An approach to supervised training of character templates from page images and unaligned transcriptions is proposed. The template training problem is formulated as one of constrained maximum likelihood parameter estim...
详细信息
An approach to supervised training of character templates from page images and unaligned transcriptions is proposed. The template training problem is formulated as one of constrained maximum likelihood parameter estimation within the document image decoding framework. This leads to a three-phase iterative training algorithm consisting pf transcription alignment, aligned template estimation (ATE), and channel estimation steps. The maximum likelihood ATE problem is shown to be NP-complete and, thus, an approximate solution approach is developed. An evaluation of the training procedure in a document-specific decoding task, using the University of Washington UW-II database of scanned technical journal articles, is described.
Beginning with an observed documentimage and a model of how the image has been degraded, document image decoding recognizes printed text by attempting to find a most probable path through a hypothesized Markov source...
详细信息
ISBN:
(纸本)0819439851
Beginning with an observed documentimage and a model of how the image has been degraded, document image decoding recognizes printed text by attempting to find a most probable path through a hypothesized Markov source. The incorporation of linguistic constraints, which are expressed by a sequential predictive probabilistic language model, can improve recognition accuracy significantly in the case of moderately to severely corrupted documents. Two methods of incorporating linguistic constraints in the best-path search are described analyzed and compared. The first, called the iterated complete path algorithm, involves iteratively rescoring complete paths using conditional language model probability distributions of increasing order, expanding state only as necessary with each iteration. A property of this approach is that it results in a solution that is exactly optimal with respect to the specified source, degradation, and language models;no approximation is necessary. The second approach considered is the Stack algorithm, which is often used in speech recognition and in the decoding of convolutional codes. Experimental results are presented in which text line images that have been corrupted in a known way are recognized using both the ICP and Stack algorithms. This controlled experimental setting preserves many of the essential features and challenges of real text line decoding, while highlighting the important algorithmic issues.
Turbo recognition (TR) is a communication theory approach to the analysis of rectangular layouts, in the spirit of document image decoding. The TR algorithm, inspired by turbo decoding, is based on a generative model ...
详细信息
ISBN:
(纸本)0819439851
Turbo recognition (TR) is a communication theory approach to the analysis of rectangular layouts, in the spirit of document image decoding. The TR algorithm, inspired by turbo decoding, is based on a generative model of image production, in which two grammars are used simultaneously to describe structure in orthogonal (horizontal and vertical) directions. This enables TR to strictly embody non-local constraints that cannot be taken into account by local statistical methods. This basis in finite state grammars also allows TR to be quickly retargetable to new domains. We illustrate some of the capabilities of TR with two examples involving realistic images. While TR, like turbo decoding, is not guaranteed to recover the statistically optimal solution. we present an experiment that demonstrates its ability to produce optimal or near-optimal results on a simple yet nontrivial example, the recovery of a filled rectangle in the midst of noise. Unlike methods such as stochastic context free grammars and exhaustive search, which are often intractable beyond small images, turbo recognition scales linearly with image size, suggesting TR as an efficient yet near-optimal approach to statistical layout analysis.
暂无评论