We present a complete method for character string extraction that is specifically adapted to a cartographic context (orientation,...). We emphasize a high level reconstruction process that: (1) resolves the ambiguitie...
详细信息
We present a complete method for character string extraction that is specifically adapted to a cartographic context (orientation,...). We emphasize a high level reconstruction process that: (1) resolves the ambiguities remaining from pattern analysis; (2) structures the characters into strings. The knowledge used by this process is the coherence of strings to be constructed (e.g. in orientation, scale, fonts...). We formalize the problem in the most general way and then admit two simplifying hypotheses that transform the problem into a graph optimization problem. The method implemented for solving this problem uses different techniques such as: graph theory, dynamic programming, heuristic, combinatorial exploration.... The results seem sufficiently good for an industrial application.
Model-based column segmentation is described. Sequences of horizontal white space across a column are used as the basic features. Structures of columns in a specific publication are described by two levels of regular ...
详细信息
Model-based column segmentation is described. Sequences of horizontal white space across a column are used as the basic features. Structures of columns in a specific publication are described by two levels of regular expressions: column expressions (CE) and element expressions (EE). Additional spatial constraints for element attributes can be described. A CE represents patterns of element sequences. An EE represents patterns of white space sequences for each element type. Segmentation is performed in three steps: element candidate extraction using EEs, column structure verification using the CE and ranking by comparison with statistical data. Experiments were performed on columns in two different scientific journals. More than 70% of the columns were correctly segmented as the top choice and more than 87% were in the top three choices. When spatial constraints were applied to element attributes, the rate was more than 90%.
Extracting structural information from paper documents supports the daily document processing by, for example, automatically finding index terms, document topics, etc. Knowledge about such components are modeled in a ...
详细信息
Extracting structural information from paper documents supports the daily document processing by, for example, automatically finding index terms, document topics, etc. Knowledge about such components are modeled in a semantic net, which describes geometric properties, spatial relationships, lexical entities as well as lexical relationships. The document model is used to extract the sender, date, recipient, opening and closing formula from a business letter. 181 business letters have been processed, divided into a training set of 20 and the remaining ones for testing. The error rates for the test set range from 0.022 to 0.049 by an average rejection rate of 0.4. Results show that the computational effort can be limited to O(n/sup 2/) given n primitive objects for matching.
The characters written by the same writer are expected to have the following two characteristics. (1) The characters belonging to the same category have similar shapes. (2) There is a shape correlation among character...
详细信息
The characters written by the same writer are expected to have the following two characteristics. (1) The characters belonging to the same category have similar shapes. (2) There is a shape correlation among characters belonging to different categories. This paper is is aimed at recognition performance improvement using these characteristics. First, this paper describes a method to verify these personal handwriting characteristics using transformed features through principal component analysis. Next, based on the idea that a misrecognized character has an unnatural shape relation with other characters recognized correctly, this paper describes two methods to detect such unnaturalness, which are "within category" detection and "between category" detection. recognition performance has been improved significantly, especially when unnaturalness is combined with the distances obtained in the recognition process.
For the analytic on-line recognition of handwriting, the range of patternrecognition problems can be described by the severity of letter segmentation required. More difficult problems require an interaction of letter...
详细信息
For the analytic on-line recognition of handwriting, the range of patternrecognition problems can be described by the severity of letter segmentation required. More difficult problems require an interaction of letter segmentation and recognition. These problems include overlapping discretely written characters, pure cursive writing, and mixed cursive and discrete writing. To these problems concerning the letter segmentation, the word segmentation problems is added. Since a script can contain numbers, capital letters as well as lowercase letters, it is necessary to have a system able to recognize them. This paper describes an on-line system for identifying and recognizing numeral characters and capital letters in handwriting sentences. This system provides two segmentation modules: the first one is to isolate the word drawings within a sentence, and the other one is to separate numeral characters and capital letters from a mixed writing prior to their recognition.
The paper proposes a method for extracting slant characters from complicated background figures efficiently and rapidly. In this method, slant character candidates are extracted using the black pixel density features,...
详细信息
The paper proposes a method for extracting slant characters from complicated background figures efficiently and rapidly. In this method, slant character candidates are extracted using the black pixel density features, that is, matching rate of two different sized circular templates, which are inscribing and circumscribing a target character, with an original image. In order to estimate performance of the proposed method, the method was applied to 41 topographic map images (512/spl times/512 pixels) involving 1032 slant characters. As a result, the average number of character candidates per character was reduced to about 41 candidates, and 94.3% of 1032 slant characters were extracted correctly.
The author proposes an extraction method for signatures and seal imprints using their color information for the automatic verification of Japanese bankchecks. The colors of the signature, the seal imprint, and the bac...
详细信息
The author proposes an extraction method for signatures and seal imprints using their color information for the automatic verification of Japanese bankchecks. The colors of the signature, the seal imprint, and the background pattern are generally different from each other. Accordingly the pixels of a color bank-check image consist of three clusters corresponding to signature, seal imprint and background in RGB 3D color space. The specific color region (i.e. the signature or seal imprint) can be extracted by projecting orthogonally all pixels of the bank-check image on to an appropriate axis in RGB color space, and by thresholding. An extraction program based on this method has been tested on about 40 images of real Japanese bankchecks with handwritten signatures and seal imprints. Experimental results show that this method can extract signatures and seal imprints separately and accurately.
Rumours of the death of the problem of machine-printed text recognition have been greatly exaggerated. Reported results can be good enough to lead one to believe that this is a "solved problem". Closer analy...
详细信息
Rumours of the death of the problem of machine-printed text recognition have been greatly exaggerated. Reported results can be good enough to lead one to believe that this is a "solved problem". Closer analysis reveals test data that is often limited in its range of fonts and point sizes. Worse still, results are commonly quoted for noise-free images, ignoring the problems of recognising "real" documents such as faxes. Various methods have been proposed for modelling characters with Hidden Markov Models. The authors, amongst others, have suggested representing a character by analysing the pixel pattern in columns of its image, and linking sequential column patterns together with a HMM. In this paper we propose a method of quantising the patterns by means of a Shift Invariant Hamming Distance. A full experimental evaluation (45 fonts, 5 point sizes) in typical noise results in a recognition accuracy of 99% in the top-3 choices, and 94% top-choice for the best font. The method has a significant advantage in recognising noisy word images, due to classification being achieved without a prior segmentation of the word into characters.
This work proposes a new approach to signature verification. It is inspired by the human learning and the approach adopted by the expert examiner of signatures, in which an a priori knowledge of the class of forgeries...
详细信息
This work proposes a new approach to signature verification. It is inspired by the human learning and the approach adopted by the expert examiner of signatures, in which an a priori knowledge of the class of forgeries is not required in order to perform the verification task. Based on this approach, we present a Fuzzy ARTMAP based system for the elimination of random forgeries. Compared to the conventional systems proposed thus far, the presented system is trained with genuine signatures only. Six experiments have been performed on a data base of 200 signatures taken from five writers (40 signatures/writer). Evaluation of the system was measured using different numbers of training signatures.
The formalism of support logic provides a framework for deductive inference, with mathematically sound and consistent treatment of uncertainty and evidence which is aggregated through the reasoning process. We apply s...
详细信息
ISBN:
(纸本)0780318978
The formalism of support logic provides a framework for deductive inference, with mathematically sound and consistent treatment of uncertainty and evidence which is aggregated through the reasoning process. We apply support logic programming to patternrecognition. Initially, a pattern classifier is constructed by encoding expert knowledge of the problem domain into rules of support logic. Fuzzy sets allow the general properties of features to be described precisely. Semantic unification provides an alternative to the usual metric-based similarity criteria. The validity of the approach is established by cross-validating the support logic classifier against models from alternative paradigms. We then attempt to circumvent the requirement for a domain expert, and assess the extent to which data-driven learning processes can be used to automatically derive components of the support logic classifier.
暂无评论