• DocumentCode
    2013151
  • Title

    Middle Zone Component Extraction and Recognition of Telugu Document Image

  • Author

    Pratap, R.L. ; Satyaprasad, L. ; Sastry, A.

  • Author_Institution
    JNTU Coll. of Eng., Hyderabad
  • Volume
    2
  • fYear
    2007
  • fDate
    23-26 Sept. 2007
  • Firstpage
    584
  • Lastpage
    588
  • Abstract
    Telugu is one of the ancient languages of South India. It has a complex orthography with a large number of distinct character shapes composed of simple and compound characters. The work reported in literature till the recent period is based on the connected component approach. Less attention is observed on the generalized character model and its application in the OCR development. Script syllable follows canonical structure where a consonant vowel core is preceded by one or two optional consonants .Formation of a syllable posses unique structural nature. In the present work, structural features of the syllable and the component model are combined to extract middle zone components. The shape of the middle zone components is closely related to a circle whereas other components are found with different topological features. Recognition rate of 99 percent is observed with the proposed method.
  • Keywords
    document image processing; feature extraction; image recognition; OCR; South India; Telugu document image recognition; middle zone component extraction; orthography; script syllable; Character recognition; Data mining; Educational institutions; Feature extraction; Head; Image recognition; Image segmentation; Optical character recognition software; Shape; Writing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on
  • Conference_Location
    Parana
  • ISSN
    1520-5363
  • Print_ISBN
    978-0-7695-2822-9
  • Type

    conf

  • DOI
    10.1109/ICDAR.2007.4376982
  • Filename
    4376982