The optical character recognition for the right to left and cursive languages such as Arabic is challenging and received little attention from researchers in the past compared to the other Latin ***,the absence of a s...
详细信息
The optical character recognition for the right to left and cursive languages such as Arabic is challenging and received little attention from researchers in the past compared to the other Latin ***,the absence of a standard publicly available dataset for several low-resource lan-guages,including the Pashto language remained a hurdle in the advancement of language *** that,a clean dataset is the fundamental and core requirement of character recognition,this research begins with dataset generation and aims at a system capable of complete language *** in view the complete and full autonomous recognition of the cursive Pashto *** first achievement of this research is a clean and standard dataset for the isolated characters of the Pashto *** this paper,a database of isolated Pashto characters for forty four alphabets using various font styles has been *** order to overcome the font style shortage,the graphical software Inkscape has been used to generate sufficient image data samples for each *** dataset has been pre-processed and reduced in dimensions to 32×32 pixels,and further converted into the binary format with a black background and white text so that it resembles the Modified National Institute of Standards and Technology(MNIST)*** benchmark database is publicly available for further research on the standard GitHub and Kaggle database servers both in pixel and Comma Separated Values(CSV)formats.
暂无评论