• DocumentCode
    3776384
  • Title

    Determination of optimal features database for OCR of printed Telugu text

  • Author

    C. Vasantha Lakshmi;Sarika Singh;C. Patvardhan

  • Author_Institution
    Dept. of Physics & Com. Sc., Dayalbagh Educational Institute, Agra, U.P
  • fYear
    2015
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    OCR (Optical Character Recognition) systems are being developed due to their numerous applications even for Indian scripts like Telugu which are complicated due to the usage of a large number of symbols. OCR systems typically store pre-computed features of symbols to be recognized in a database. Recognition of an unknown symbol is performed by finding the symbol in the database that is nearest in features space. Design of an appropriate database is, therefore, a critical step. This is especially so when the OCR system targets recognition of numerous symbols in multiple fonts and sizes. The idea is to develop an OCR system that has small recognition times and high recognition accuracies. The naive approach of putting features of all symbols in all fonts and sizes in the database might be counterproductive on both counts. Experimental results on text document images with multiple fonts and sizes show that the strategy for database design for OCR of printed Telugu text proposed in this paper achieves both the objectives. This is the first reported approach for such a database design for Telugu OCR.
  • Keywords
    "Databases","Optical character recognition software","Feature extraction","Classification algorithms","Target recognition","Histograms","Prototypes"
  • Publisher
    ieee
  • Conference_Titel
    Systems Conference (NSC), 2015 39th National
  • Type

    conf

  • DOI
    10.1109/NATSYS.2015.7489112
  • Filename
    7489112