DocumentCode
2530309
Title
Gabor filter based block energy analysis for text extraction from digital document images
Author
Raju, Sabari ; Pati, S. Peeta Basa ; Ramakrishnan, A.G.
Author_Institution
Dept. of Electr. Eng., Indian Inst. of Sci., Bangalore, India
fYear
2004
fDate
2004
Firstpage
233
Lastpage
243
Abstract
Extraction of text areas is a necessary first step for taking a complex document image for diameter recognition task. In digital libraries, such OCR´ed text facilitates access to the image of document page through keyword search. Gabor filters, known to be simulating certain characteristics of the human visual system (HVS), have been employed for this task by a large number of scientists, in scanned document images. Adapting such a scheme for camera based document images is a relatively new approach. Moreover, design of the appropriate filters to separate text areas, which are assumed to be rich in high frequency components, from nontext areas is a difficult task. The difficulty increases if the clutter is also rich in high frequency components. Other reported works, on separating text from nontext areas, have used geometrical/structural information like shape and size of the regions in binarized document images. In this work, we have used a combination of the above mentioned approaches for the purpose. We have used connected component analysis (CCA), in binarized images, to segment nontext areas based on the size information of the connected regions. A Gabor function based filter bank is used to separate the text and the nontext areas of comparable size. The technique is shown to work efficiently on different kinds of scanned document images, camera captured document images and sometimes on scenic images.
Keywords
digital libraries; document image processing; feature extraction; optical character recognition; text analysis; Gabor filters; Gabor function based filter bank; OCR; binarized document images; block energy analysis; camera based document images; connected component analysis; digital document images; digital libraries; human visual system; multichannel filtering; text extraction; Cameras; Frequency; Gabor filters; Humans; Image analysis; Image recognition; Keyword search; Software libraries; Text recognition; Visual system;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Image Analysis for Libraries, 2004. Proceedings. First International Workshop on
Print_ISBN
0-7695-2088-X
Type
conf
DOI
10.1109/DIAL.2004.1263252
Filename
1263252
Link To Document