Title :
A comprehensive handwritten image corpus of isolated persian/arabic characters for OCR development and evaluation
Author :
Khosravi, Sara ; Razzazi, Farbod ; Rezaei, Hamideh ; Sadigh, Mohammad Reza
Author_Institution :
Payasoft Co., Tehran
Abstract :
In this paper, specifications, design and implementation issues of a comprehensive corpus of capital isolated handwritten character images for Persian/Arabic languages are reported. The corpus has been designed for both OCR development and evaluation purposes. The corpus contains more than 10 million characters with appropriate image quality and is supported with rich standard ground truth formatted metadata. Evaluating the accuracy of the corpus has revealed that more that 99.9% of the images are correctly labeled and the quality of more than 99.5% of images are suitable for OCR development and evaluation. This corpus may be used as a standard benchmark for OCR in Persian/Arabic OCR system.
Keywords :
handwritten character recognition; meta data; natural languages; optical character recognition; visual databases; OCR development; Persian-Arabic languages; capital isolated handwritten character image corpus; image databases; metadata; Data mining; Design methodology; Filling; Image databases; Image quality; Optical character recognition software; Robustness; Software libraries; Standards development; XML;
Conference_Titel :
Signal Processing and Its Applications, 2007. ISSPA 2007. 9th International Symposium on
Conference_Location :
Sharjah
Print_ISBN :
978-1-4244-0778-1
Electronic_ISBN :
978-1-4244-1779-8
DOI :
10.1109/ISSPA.2007.4555567