Text/graphics separation in document image analysis is one of the main concerns in present research work. The complexity enhances when both text and graphics overlap in the context of maps in color images. This paper ...
详细信息
Text/graphics separation in document image analysis is one of the main concerns in present research work. The complexity enhances when both text and graphics overlap in the context of maps in color images. This paper discusses a number of improvements to text/graphics separation methods to make it suitable for maps. Emphasize is given to the overlapping regions of text and graphics. It also discusses a method of color separation using clustering method for the purpose of text/graphics separation
This paper presents a pioneering study on automatic dating of handwritten manuscripts. Analysis of handwriting style forms the core of the dating method. Initially, it is hypothesized that a manuscript can be dated, t...
详细信息
This paper presents a pioneering study on automatic dating of handwritten manuscripts. Analysis of handwriting style forms the core of the dating method. Initially, it is hypothesized that a manuscript can be dated, to a certain level of accuracy, by looking at the way it is written. The hypothesis is then verified with real samples of known dates. A general framework is proposed for machine dating of handwritten manuscripts. Experiments on a database containing manuscripts of Gustave Flaubert (1821- 1880), the famous French novelist reports about 62% accuracy when manuscripts are dated within a range of five calendar years with respect to their exact year of writing.
This paper deals with a quadratic classifier based scheme for the recognition of off-line handwritten numerals of Kannada, an important indian script. The features used in the classifier are obtained from the directio...
详细信息
This paper deals with a quadratic classifier based scheme for the recognition of off-line handwritten numerals of Kannada, an important indian script. The features used in the classifier are obtained from the directional chain code information of the contour points of the characters. The bounding box of a character is segmented into blocks and the chain code histogram is computed in each of the blocks. Here we have used 64 dimensional and 100 dimensional features for a comparative study on the recognition accuracy of our proposed system. This chain code features are fed to the quadratic classifier for recognition. We tested our scheme on 2300 data samples and obtained 97.87% and 98.45% recognition accuracy using 64 dimensional and 100 dimensional features respectively, from the proposed scheme using five-fold cross-validation technique.
Struck-out words are often found in handwritten manuscripts. A realistic off-line handwriting recognition system should take care of this common aspect. A simple but efficient approach to this problem is to subject ea...
详细信息
This paper is concerned with research on OCR (optical character recognition) of printed mathematical expressions. Construction of a representative corpus of technical and scientific documents containing expressions is...
详细信息
recognition of handwritten characters is difficult because of variability involved in the writing style of different individuals. This paper deals with recognition of off-line Bangla handwritten characters using quadr...
详细信息
No significant research work towards recognition of handwritten Bangla characters has yet been done. Only a few works in this area are found in the literature which are based on small databases col-lected in laborator...
详细信息
This paper deals with segmentation and recognition of touching characters appearing in scanned mathematical expressions. The technique is based on multifactorial analysis that integrates several factors determining cu...
详细信息
This paper deals with segmentation and recognition of touching characters appearing in scanned mathematical expressions. The technique is based on multifactorial analysis that integrates several factors determining cut-positions in a touching character image. A predictive algorithm is developed for efficient selection of possible cut-positions for segmenting touching characters. Experiment has been carried out using a test-set of reasonable size and results show that a considerable improvement in recognition accuracy can be achieved with a modest increase in computations.
Postal automation is a topic of research over the last few years. There are many works towards the postal automation in USA, UK, Japan and Australia, but for indian postal automation there is no significant work. This...
详细信息
Postal automation is a topic of research over the last few years. There are many works towards the postal automation in USA, UK, Japan and Australia, but for indian postal automation there is no significant work. This paper deals with word-wise handwritten script identification for indian postal automation. In the proposed scheme at first document skew is detected and corrected. Non-text parts are then segmented from the document using run length smoothing algorithm (RLSA). Next, using a piecewise projection method the destination address block (DAB), is at first segmented into lines and then into words. Using water reservoir concept we compute the busy-zone of the word. Finally, using matra/Shirorekha, water reservoir concept based feature, fractal based feature, etc. a neural network (NN) classifier is generated for word-wise Bangla and English scripts identification. Overall accuracy of the proposed system is at present 9 7.62%.
Stemming is used in many information retrieval (IR) systems to reduce variant word forms to common roots, and thereby improving the overall retrieval efficiency. This paper presents an algorithm for stemming in the co...
详细信息
Stemming is used in many information retrieval (IR) systems to reduce variant word forms to common roots, and thereby improving the overall retrieval efficiency. This paper presents an algorithm for stemming in the context of document image retrieval system. The algorithm assumes that the documents are symbolically compressed and stemming has been attempted in the compressed domain itself. Experiments have been conducted on indian language imaged documents for which efficient OCR still remains a challenging task. Results obtained from a set 150 document images (in Bangla script, the second most popular script in the indian sub-continent) consisting of about 12K word show a promising performance of the proposed approach.
暂无评论