Middle Zone Component Extraction and Recognition of Telugu Document Image

Author

Pratap, R.L. ; Satyaprasad, L. ; Sastry, A.

Author_Institution

JNTU Coll. of Eng., Hyderabad

Volume

2

fYear

2007

fDate

23-26 Sept. 2007

Firstpage

584

Lastpage

588

Abstract

Telugu is one of the ancient languages of South India. It has a complex orthography with a large number of distinct character shapes composed of simple and compound characters. The work reported in literature till the recent period is based on the connected component approach. Less attention is observed on the generalized character model and its application in the OCR development. Script syllable follows canonical structure where a consonant vowel core is preceded by one or two optional consonants .Formation of a syllable posses unique structural nature. In the present work, structural features of the syllable and the component model are combined to extract middle zone components. The shape of the middle zone components is closely related to a circle whereas other components are found with different topological features. Recognition rate of 99 percent is observed with the proposed method.

Keywords

document image processing; feature extraction; image recognition; OCR; South India; Telugu document image recognition; middle zone component extraction; orthography; script syllable; Character recognition; Data mining; Educational institutions; Feature extraction; Head; Image recognition; Image segmentation; Optical character recognition software; Shape; Writing;

fLanguage

English

Publisher

ieee

Conference_Titel

Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on

Conference_Location

Parana

ISSN

1520-5363

Print_ISBN

978-0-7695-2822-9

Type

conf

DOI

10.1109/ICDAR.2007.4376982

Filename

4376982