This paper presents a pronominal anaphora resolution (PAR) approach that makes use of the global discourse knowledge along with other traditional features. So far the features used in finding the referent of an anapho...
详细信息
This paper presents a pronominal anaphora resolution (PAR) approach that makes use of the global discourse knowledge along with other traditional features. So far the features used in finding the referent of an anaphoric pronoun are computed locally. Normally the sentence containing the anaphor and a few sentences immediately before form the local context. In this process, the knowledge base gets updated as more and more of the discourse is processed. Keeping this approach as the core, the present paper explores use of some prior knowledge after examining the entire discourse (whole article). Addition of this processing step improves the PAR's efficiency. This improvement is demonstrated using ICON 2011 Bangla dataset.
In this paper, we describe an approach to distinguish between hand-written text and machine-printed text from annotated machine-printed Bangla Documents images. In applications involving OCR, distinction of machine-pr...
详细信息
In this paper, we describe an approach to distinguish between hand-written text and machine-printed text from annotated machine-printed Bangla Documents images. In applications involving OCR, distinction of machine-printed and hand-written characters is important, so that they can be sent to separate recognition engines. Identification of hand-written parts is useful in deleting those parts and cleaning the document image as well. In this paper a classification system is presented which takes a connected component in the document image and assigns them to two classes namely "machine-printed" and for "hand-written" classes, respectively. The proposed system contains a preprocessing step, which smoothes the object border and finds the Connected Component. Bangla script specific features are extracted from that Connected Component image, and a standard classifier based on SVM generates the final response. Experimental results on a data set show that the proposed approach achieves an overall accuracy of 96.49%.
Extraction and recognition of text present in video has become a very popular research area in the last decade. Generally, text present in video frames is of different size, orientation, style, etc. with complex backg...
详细信息
Extraction and recognition of text present in video has become a very popular research area in the last decade. Generally, text present in video frames is of different size, orientation, style, etc. with complex backgrounds, noise, low resolution and contrast. These factors make the automatic text extraction and recognition in video frames a challenging task. A large number of techniques have been proposed by various researchers in the recent past to address the problem. This paper presents a review of various state-of-the-art techniques proposed towards different stages (e.g. detection, localization, extraction, etc.) of text information processing in video frames. Looking at the growing popularity and the recent developments in the processing of text in video frames, this review imparts details of current trends and potential directions for further research activities to assist researchers.
In the field of information security, the usage of biometrics is growing for user authentication. Automatic signature recognition and verification is one of the biometric techniques, which is only one of several used ...
详细信息
ISBN:
(纸本)9781467314886
In the field of information security, the usage of biometrics is growing for user authentication. Automatic signature recognition and verification is one of the biometric techniques, which is only one of several used to verify the identity of individuals. In this paper, a foreground and background based technique is proposed for identification of scripts from bi-lingual (English/Roman and Chinese) off-line signatures. This system will identify whether a claimed signature belongs to the group of English signatures or Chinese signatures. The identification of signatures based on its script is a major contribution for multi-script signature verification. Two background information extraction techniques are used to produce the background components of the signature images. Gradient-based method was used to extract the features of the foreground as well as background components. Zernike Moment feature was also employed on signature samples. Support Vector Machine (SVM) is used as the classifier for signature identification in the proposed system. A database of 1120 (640 English+480 Chinese) signature samples were used for training and 560 (320 English+240 Chinese) signature samples were used for testing the proposed system. An encouraging identification accuracy of 97.70% was obtained using gradient feature from the experiment.
This paper deals with recognition of online handwritten Bangla (Bengali) text. Here, at first, we segment cursive words into strokes. A stroke may represent a character or a part of a character. We selected a set of B...
详细信息
This paper deals with recognition of online handwritten Bangla (Bengali) text. Here, at first, we segment cursive words into strokes. A stroke may represent a character or a part of a character. We selected a set of Bangla words written by different groups of people such that they contain all basic characters, all vowel and consonant modifiers and almost all types of possible joining among them. For segmentation of text into strokes, we discovered some rules analyzing different joining patterns of Bangla characters. Combination of online and offline information was used for segmentation. We achieved correct segmentation rate of 97.89% on the dataset. We manually analyzed different strokes to create a ground truth set of distinct stroke classes for result verification and we obtained 85 stroke classes. Directional features were used in SVM for recognition and we achieved correct stroke recognition rate of 97.68%.
Character recognition (Printed and Handwritten) system has become an extremely useful tool in Human computer Interaction. Handwriting is a complex perceptual motor task generating linguistic information. Characters re...
详细信息
Automatic identification of an individual based on his/her handwriting characteristics is an important forensic tool. In a computational forensic scenario, presence of huge amount of text/information in a questioned d...
详细信息
Automatic identification of an individual based on his/her handwriting characteristics is an important forensic tool. In a computational forensic scenario, presence of huge amount of text/information in a questioned document cannot be ensured. Lack of data threatens system reliability in such cases. We here propose a writer identification system for Oriya script which is capable of performing reasonably well even with small amount of text. Experiments with curvature feature are reported here, using Support Vector Machine (SVM) as classifier. We got promising results of 94.00% writer identification accuracy at first top choice and 99% when considering first three top choices.
In recent years, many techniques for the recognition of Persian/Arabic handwritten documents have been proposed by researchers. To test the promises of different features extraction and classification methods and to p...
详细信息
In recent years, many techniques for the recognition of Persian/Arabic handwritten documents have been proposed by researchers. To test the promises of different features extraction and classification methods and to provide a new benchmark for future research, in this paper a comparative study of Persian/Arabic handwritten character recognition using different feature sets and classifiers is presented. Feature sets used in this study are computed based on gradient, directional chain code, shadow, under-sampled bitmap, intersection/junction/endpoint, and line-fitting information. Support Vector Machines (SVMs), Nearest Neighbour (NN), k-Nearest Neighbour (k-NN) are used as different classifiers. We evaluated the proposed systems on a standard dataset of Persian handwritten characters. Using 36682 samples for training, we tested the proposed recognition systems on other 15338 samples and their detailed results are reported. The best correct recognition of 96.91% is obtained in this comparative study.
Font can be used as a notion of similarity amongst multiple documents written in same script. We could automatically retrieve document images with specific font from a huge digital document repository. So Optical Font...
详细信息
ISBN:
(纸本)9781467322164
Font can be used as a notion of similarity amongst multiple documents written in same script. We could automatically retrieve document images with specific font from a huge digital document repository. So Optical Font recognition could be a useful pre-processing step in an automated questioned document analysis system for sorting documents with similar fonts. We propose a scheme to identify 10 different fonts for an Indic script (Bangla). Curvature-based features are extracted from segmented characters and are fed to a Support Vector Machine (SVM) classifier. The classifier determines the font type for each segmented character obtained from a document. Later, font identification for that document is executed on the basis of majority voting amongst 10 different fonts for all characters. Using a Multiple Kernel SVM classifier we obtained 98.5% accuracy from 400 test documents (40 documents for each font type).
Classification/misclassification of similar shaped characters largely affects OCR accuracy. Sometimes occlusion/insertion of a part of character (due to inferior scanning quality) also makes it look alike another char...
详细信息
Classification/misclassification of similar shaped characters largely affects OCR accuracy. Sometimes occlusion/insertion of a part of character (due to inferior scanning quality) also makes it look alike another character type. For such adverse situations a part based character recognition system could be more effective. In order to encounter mentioned adverse scenario we propose a new feature encoding technique. This feature encoding is based on the amalgamation of Gabor filter-based features with SURF features (G-SURF). Features generated from a character are provided to Support Vector Machine (SVM) classifier. We obtained an encouraging accuracy on similar shaped characters from three different scripts.
暂无评论