咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Automatic Identification of Or... 收藏

Automatic Identification of Oriental and Other Scripts in Image Documents

作     者:C. Y. SUEN S. BERGLER N. NOBILE W. PAN B. WAKED 

作者机构:Centre for Pattern Recognition and Machine Intelligence CENPARMI Concordia University 1455 de Maisonneuve Blvd. West Suite GM-606 Montreal Quebec H3G 1M8 Canada 

出 版 物:《International Journal of Computer Processing of Languages》 

年 卷 期:2005年第18卷第2期

页      面:77-94页

学科分类:08[工学] 0812[工学-计算机科学与技术(可授工学、理学学位)] 

主  题:OCR Script Arabic Oriental Roman Cyrillic Gabor filters 

摘      要:Increasing amount of paper documents are produced and received by many organizations. Frequently, they have to be digitized for electronic archiving and later information retrieval or data mining, requiring scanning and OCR. Since OCR techniques are language dependent, the language of the original document must be identified first by advanced technology. This paper describes two methods of identifying Oriental languages among four language groups, i.e. Oriental, Roman, Cyrillic, and Arabic. One method is based on features extracted from the shapes of words and letters, while the other is based on global analysis of text pieces using Gabor filters. Experimental results on hundreds of both clean and noisy documents indicate that the proposed classification approaches look quite promising. The use of linguistic analysis to enhance the results is also discussed.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分