DocumentCode :
2530439
Title :
Combining spatial and transform features for the recognition of middle zone components of Telugu
Author :
Sastry, A. S Chandrasekhara ; Lanka, Satyaprasad ; Clee, P. Paul ; Reddy, L. Pratap
Author_Institution :
GVP Coll. of Eng., Visakhapatnam
fYear :
2008
fDate :
19-21 Nov. 2008
Firstpage :
1
Lastpage :
5
Abstract :
The transformation from the traditional paper based society to a truly paperless information society involves huge amount of knowledge with necessary algorithmic approaches in the area of Document Image Processing. Progress in Indic Script analysis gained momentum in the recent period. Individual characters in these scripts undergo large number of shape variations due to complex nature of the canonical structure resembling the phonetic sequence. Separation of individual components and establishment of the relationship between these components in the recognition process is the major approach found in literature. In this paper, an attempt is made to extract Middle Zone Components by combining Component model and Zone Separation model on Telugu Document Images. Recognition of middle zone components is achieved with a novel technique of combining spatial features for understanding the topological characteristics and transform feature for effective classification. A tree classifier is adopted with Euler Number, Compact Ratio and Zernike moment as features. Unsupervised training strategy is adopted to identify the Middle Zone components. The optimum size of the training set is evaluated for various font sizes.
Keywords :
character recognition; document image processing; image classification; image recognition; unsupervised learning; Indic script analysis; Telugu script; compact ratio; document image processing; middle zone components recognition; phonetic sequence; tree classifier; unsupervised training strategy; Character recognition; Classification tree analysis; Data mining; Document image processing; Educational institutions; Feature extraction; Image segmentation; Optical character recognition software; Shape; Writing; Canonical Structure; Middle zone components; Projection profile; Spatial and Transform features; Telugu script;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
TENCON 2008 - 2008 IEEE Region 10 Conference
Conference_Location :
Hyderabad
Print_ISBN :
978-1-4244-2408-5
Electronic_ISBN :
978-1-4244-2409-2
Type :
conf
DOI :
10.1109/TENCON.2008.4766721
Filename :
4766721
Link To Document :
بازگشت