Determination of optimal features database for OCR of printed Telugu text

Author

C. Vasantha Lakshmi;Sarika Singh;C. Patvardhan

Author_Institution

Dept. of Physics & Com. Sc., Dayalbagh Educational Institute, Agra, U.P

fYear

2015

Firstpage

1

Lastpage

6

Abstract

OCR (Optical Character Recognition) systems are being developed due to their numerous applications even for Indian scripts like Telugu which are complicated due to the usage of a large number of symbols. OCR systems typically store pre-computed features of symbols to be recognized in a database. Recognition of an unknown symbol is performed by finding the symbol in the database that is nearest in features space. Design of an appropriate database is, therefore, a critical step. This is especially so when the OCR system targets recognition of numerous symbols in multiple fonts and sizes. The idea is to develop an OCR system that has small recognition times and high recognition accuracies. The naive approach of putting features of all symbols in all fonts and sizes in the database might be counterproductive on both counts. Experimental results on text document images with multiple fonts and sizes show that the strategy for database design for OCR of printed Telugu text proposed in this paper achieves both the objectives. This is the first reported approach for such a database design for Telugu OCR.

Keywords

"Databases","Optical character recognition software","Feature extraction","Classification algorithms","Target recognition","Histograms","Prototypes"

Publisher

ieee

Conference_Titel

Systems Conference (NSC), 2015 39th National

Type

conf

DOI

10.1109/NATSYS.2015.7489112

Filename

7489112