This article presents our recent study on fusion of information at feature and classifier output levels for improved performance of offline handwritten Devanagari word recognition. We consider here two state-of-the-ar...
详细信息
ISBN:
(纸本)9781479961016
This article presents our recent study on fusion of information at feature and classifier output levels for improved performance of offline handwritten Devanagari word recognition. We consider here two state-of-the-art features, viz., Directional Distance Distribution (DDD) and Gradient-Structural-Concavity (GSC) features along with multi-class SVM classifiers. Here, we study various combinations of DDD features along with one or more features from the GSC feature set. We experiment by presenting different combined feature vectors as input to SVM classifiers. Also, the output vectors of different SVM classifiers fed with different feature vectors are combined by another SVM classifier. The combination of the outputs of two SVMs each being fed with a different feature vector provides superior performance to the performance of a single SVM classifier fed with the combined feature vector. Experimental results are obtained on a large handwritten Devanagari word sample image database of 100 Indian town names. The recognition results on its test samples show that SVM recognition output of DDD features combined with the SVM output of GSC features improves the final recognition accuracy significantly.
In this paper we present how Bag-of-Features Hidden Markov Models can be applied to printed Bangla word spotting. These statistical models allow for an easy adaption to different problem domains. This is possible due ...
详细信息
Applications on Medical Image Analysis suffer from acute shortage of large volume of data properly annotated by medical experts. Supervised Learning algorithms require a large volumes of balanced data to learn robust ...
详细信息
Optical Character recognition (OCR) has been deployed in the past in different application areas such as automatic transcription and indexing of document images, reading aid for the visually impaired persons, postal a...
详细信息
This paper presents a pronominal anaphora resolution (PAR) approach that makes use of the global discourse knowledge along with other traditional features. So far the features used in finding the referent of an anapho...
详细信息
This paper presents a pronominal anaphora resolution (PAR) approach that makes use of the global discourse knowledge along with other traditional features. So far the features used in finding the referent of an anaphoric pronoun are computed locally. Normally the sentence containing the anaphor and a few sentences immediately before form the local context. In this process, the knowledge base gets updated as more and more of the discourse is processed. Keeping this approach as the core, the present paper explores use of some prior knowledge after examining the entire discourse (whole article). Addition of this processing step improves the PAR's efficiency. This improvement is demonstrated using ICON 2011 Bangla dataset.
Extraction of some meta-information from printed documents without an OCR approach is considered. It can be statistically verified that important terms in articles are printed in italic, bold and all capital style. De...
详细信息
Extraction of some meta-information from printed documents without an OCR approach is considered. It can be statistically verified that important terms in articles are printed in italic, bold and all capital style. Detection of these type styles helps in automatic extraction of the lines containing titles, authors' names, subtitles, references as well as sentences having important terms occurring in the text. It also helps in improving the OCR performance for reading the italic text. Some experimental results on the performance of the approach on good quality as well as degraded document images are presented.
recognition of handwritten similar shaped character is a difficult problem and in character recognition system most of the errors occur from similar shaped characters. In this paper we proposed a novel feature extract...
详细信息
A document page may contain two or more different scripts. For Optical Character recognition (OCR) of such a document page, it is necessary to separate different scripts before feeding them to their individual OCR sys...
详细信息
ISBN:
(纸本)0769519601
A document page may contain two or more different scripts. For Optical Character recognition (OCR) of such a document page, it is necessary to separate different scripts before feeding them to their individual OCR system. In this paper an automatic scheme is presented to identify text lines of different Indian scripts from a document. For the separation task at first the scripts are grouped into a few classes according to script characteristics. Next feature based on water reservoir principle, contour tracing, profile etc. are employed to identify them without any expensive OCR-like algorithms. At present, the system has an overall accuracy of about 97.52%.
This paper deals with an Optical Character recognition system for printed Urdu, a popular Indian script. The development of OCR for this script is difficult because (i) a large number of characters have to be recogniz...
详细信息
ISBN:
(纸本)0769519601
This paper deals with an Optical Character recognition system for printed Urdu, a popular Indian script. The development of OCR for this script is difficult because (i) a large number of characters have to be recognized (ii) there are many similar shaped characters. In the proposed system individual characters are recognized using a combination of topological, contour and water reservoir concept based features. The feature detection methods are simple and robust. A prototype of the system has been tested on printed Urdu characters and currently achieves 97.8% character level accuracy on average.
We propose simple and fast algorithms for detection of italic, bold and all-capital words without doing actual character recognition. We present a statistical study which reveals that the detection of such words may p...
详细信息
We propose simple and fast algorithms for detection of italic, bold and all-capital words without doing actual character recognition. We present a statistical study which reveals that the detection of such words may play a key role in automatic information retrieval from documents. Moreover, detection of italic words can be used to improve the recognition accuracy of a text recognition system. Considerable number of document images have been tested and our algorithms give accurate results on all the tested images, and the algorithms are very easy to implement.
暂无评论