DocumentCode
2013151
Title
Middle Zone Component Extraction and Recognition of Telugu Document Image
Author
Pratap, R.L. ; Satyaprasad, L. ; Sastry, A.
Author_Institution
JNTU Coll. of Eng., Hyderabad
Volume
2
fYear
2007
fDate
23-26 Sept. 2007
Firstpage
584
Lastpage
588
Abstract
Telugu is one of the ancient languages of South India. It has a complex orthography with a large number of distinct character shapes composed of simple and compound characters. The work reported in literature till the recent period is based on the connected component approach. Less attention is observed on the generalized character model and its application in the OCR development. Script syllable follows canonical structure where a consonant vowel core is preceded by one or two optional consonants .Formation of a syllable posses unique structural nature. In the present work, structural features of the syllable and the component model are combined to extract middle zone components. The shape of the middle zone components is closely related to a circle whereas other components are found with different topological features. Recognition rate of 99 percent is observed with the proposed method.
Keywords
document image processing; feature extraction; image recognition; OCR; South India; Telugu document image recognition; middle zone component extraction; orthography; script syllable; Character recognition; Data mining; Educational institutions; Feature extraction; Head; Image recognition; Image segmentation; Optical character recognition software; Shape; Writing;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on
Conference_Location
Parana
ISSN
1520-5363
Print_ISBN
978-0-7695-2822-9
Type
conf
DOI
10.1109/ICDAR.2007.4376982
Filename
4376982
Link To Document