DocumentCode :
2637193
Title :
Localization and extraction of text in Telugu document images
Author :
Negi, Atul ; Kasinadhuni, Nikhil
Author_Institution :
Dept. of Comput. & Inf. Sci., Hyderabad Univ., India
Volume :
2
fYear :
2003
fDate :
15-17 Oct. 2003
Firstpage :
749
Abstract :
Segmentation of document images into text and non-text regions is an important step in the processing of document images, so that optical character recognition may be performed on the textual portions. Although in the literature this problem is approached in a script independent manner, we present a system to locate and extract regions of Telugu text based on the circular nature of the script. The process is started by obtaining the Sobel gradient magnitude of the gray level image. Then, the Hough transform for circles is performed to locate the circular features of Telugu text. A region growing process on the located circles yields text regions with connected blocks of text. This is followed by recursive XY cuts to segment the regions into paragraphs, lines and word regions. A region merging process with a bottom-up approach is then used to envelope individual words. Local binarization of the word MBRs yields connected components containing glyphs for recognition. The segmentation process succeeds in extracting text from images with complex non-Manhattan layouts which is commonly found in various Telugu magazines.
Keywords :
Hough transforms; document image processing; feature extraction; gradient methods; image segmentation; Hough transform; Sobel gradient magnitude; Telugu document images; binarization; circular features; document image segmentation; glyphs; gray level image; lines; nonManhattan layouts; optical character recognition; paragraphs; recursive XY cuts; region growing process; text extraction; text localization; word regions; Character recognition; Computational Intelligence Society; Filters; Image analysis; Image recognition; Image segmentation; Merging; Optical character recognition software; Performance analysis; Text recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
TENCON 2003. Conference on Convergent Technologies for the Asia-Pacific Region
Print_ISBN :
0-7803-8162-9
Type :
conf
DOI :
10.1109/TENCON.2003.1273279
Filename :
1273279
Link To Document :
بازگشت