DocumentCode :
3249380
Title :
Robust Text Line, Word And Character Extraction from Telugu Document Image
Author :
Koppula, V.K. ; Atul, Negi ; Garain, Utpal
Author_Institution :
Dept. of CSE, CMR Coll. of Eng. & Tech., Hyderabad, India
fYear :
2009
fDate :
16-18 Dec. 2009
Firstpage :
269
Lastpage :
272
Abstract :
Designing an OCR system for Indian languages in general is more complex than those of European languages due the linguistic complexity. Efforts are on the way for the development of efficient OCR systems for Indian languages, especially for Telugu, a popular South Indian language. In this paper, we proposed a method for reliable extraction of text line, word and character from document images of Telugu scripts. In the text line segmentation, first we establish the relationship between the connected components and then cluster the connected components of a line using vertical spatial relation and nearest neighbor algorithm. In word segmentation, the space between two adjacent characters is computed and clustered into word space and character space. Consonant and vowel modifiers are segregated from the word image and segment the characters.
Keywords :
document image processing; feature extraction; image segmentation; natural language processing; optical character recognition; text analysis; Indian languages; OCR system; Telugu document image; character extraction; linguistic complexity; nearest neighbor algorithm; text line extraction; text line segmentation; vertical spatial relation; word extraction; word segmentation; Carbon capture and storage; Character recognition; Clustering algorithms; Computational Intelligence Society; Computer vision; Educational institutions; Image segmentation; Nearest neighbor searches; Optical character recognition software; Robustness;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Emerging Trends in Engineering and Technology (ICETET), 2009 2nd International Conference on
Conference_Location :
Nagpur
Print_ISBN :
978-1-4244-5250-7
Electronic_ISBN :
978-0-7695-3884-6
Type :
conf
DOI :
10.1109/ICETET.2009.196
Filename :
5395511
Link To Document :
بازگشت