DocumentCode :
2146824
Title :
Towards Improving the Accuracy of Telugu OCR Systems
Author :
Kumar, P. Pavan ; Bhagvati, Chakravarthy ; Negi, Atul ; Agarwal, Arun ; Deekshatulu, B.L.
Author_Institution :
Dept. of Comput. & Inf. Sci., Univ. of Hyderabad, Hyderabad, India
fYear :
2011
fDate :
18-21 Sept. 2011
Firstpage :
910
Lastpage :
914
Abstract :
Design of a high accuracy OCR system is a challenging task as the system performance is affected by its component modules. Each module has its own impact on the overall accuracy of the OCR system. An improvement in a module reflects upon overall system performance. In the present work, we have developed an OCR system for Telugu. Our experiments on a corpus of about 1000 images has shown that the system performance is degraded due to broken characters caused by the binarization module as well as due to improper character segmentation. Therefore, we address the issues of handling broken characters and poor segmentation. A novel approach which is based on feedback from the distance measure used by the classifier is proposed to handle broken characters. For character segmentation, our proposed approach exploits the orthographic properties of Telugu script. As a result, significant improvement is obtained in the performance of the system. These algorithms are generic and may be applicable to other Indian scripts, especially to south Indian scripts. In our experiments, an end-to-end system performance is evaluated which is not reported in the literature.
Keywords :
image classification; image segmentation; natural language processing; optical character recognition; Indian scripts; Telugu OCR system; Telugu script orthographic properties; binarization module; broken characters; character segmentation; classifier; component modules; Accuracy; Character recognition; Complexity theory; Databases; Error analysis; Optical character recognition software; System performance; Indian scripts; OCR system; Telugu script; system performance;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2011 International Conference on
Conference_Location :
Beijing
ISSN :
1520-5363
Print_ISBN :
978-1-4577-1350-7
Electronic_ISBN :
1520-5363
Type :
conf
DOI :
10.1109/ICDAR.2011.185
Filename :
6065443
Link To Document :
بازگشت