DocumentCode :
1990143
Title :
A comprehensive handwritten image corpus of isolated persian/arabic characters for OCR development and evaluation
Author :
Khosravi, Sara ; Razzazi, Farbod ; Rezaei, Hamideh ; Sadigh, Mohammad Reza
Author_Institution :
Payasoft Co., Tehran
fYear :
2007
fDate :
12-15 Feb. 2007
Firstpage :
1
Lastpage :
4
Abstract :
In this paper, specifications, design and implementation issues of a comprehensive corpus of capital isolated handwritten character images for Persian/Arabic languages are reported. The corpus has been designed for both OCR development and evaluation purposes. The corpus contains more than 10 million characters with appropriate image quality and is supported with rich standard ground truth formatted metadata. Evaluating the accuracy of the corpus has revealed that more that 99.9% of the images are correctly labeled and the quality of more than 99.5% of images are suitable for OCR development and evaluation. This corpus may be used as a standard benchmark for OCR in Persian/Arabic OCR system.
Keywords :
handwritten character recognition; meta data; natural languages; optical character recognition; visual databases; OCR development; Persian-Arabic languages; capital isolated handwritten character image corpus; image databases; metadata; Data mining; Design methodology; Filling; Image databases; Image quality; Optical character recognition software; Robustness; Software libraries; Standards development; XML;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Signal Processing and Its Applications, 2007. ISSPA 2007. 9th International Symposium on
Conference_Location :
Sharjah
Print_ISBN :
978-1-4244-0778-1
Electronic_ISBN :
978-1-4244-1779-8
Type :
conf
DOI :
10.1109/ISSPA.2007.4555567
Filename :
4555567
Link To Document :
بازگشت