DocumentCode
1054335
Title
Text Extraction and Document Image Segmentation Using Matched Wavelets and MRF Model
Author
Kumar, Sunil ; Gupta, Rajat ; Khanna, Nitin ; Chaudhury, Santanu ; Joshi, Shiv Dutt
Author_Institution
IBM India Res. Lab., Delhi
Volume
16
Issue
8
fYear
2007
Firstpage
2117
Lastpage
2128
Abstract
In this paper, we have proposed a novel scheme for the extraction of textual areas of an image using globally matched wavelet filters. A clustering-based technique has been devised for estimating globally matched wavelet filters using a collection of groundtruth images. We have extended our text extraction scheme for the segmentation of document images into text, background, and picture components (which include graphics and continuous tone images). Multiple, two-class Fisher classifiers have been used for this purpose. We also exploit contextual information by using a Markov random field formulation-based pixel labeling scheme for refinement of the segmentation results. Experimental results have established effectiveness of our approach.
Keywords
Markov processes; document image processing; feature extraction; filtering theory; image classification; image segmentation; pattern clustering; random processes; text analysis; wavelet transforms; MRF-based pixel labeling scheme; Markov random field formulation; background components; clustering-based technique; contextual information; document image segmentation; globally matched wavelet filters; groundtruth images; picture components; text extraction; two-class Fisher classifiers; Asia; Data mining; Discrete wavelet transforms; Graphics; Image color analysis; Image segmentation; Labeling; Layout; Markov random fields; Matched filters; $alpha $ -expansion; Markov random field (MRF); document image; globally matched wavelets (GMWs); matched wavelets; scene image; Algorithms; Artificial Intelligence; Documentation; Image Enhancement; Image Interpretation, Computer-Assisted; Information Storage and Retrieval; Natural Language Processing; Pattern Recognition, Automated; Printing; Reproducibility of Results; Sensitivity and Specificity;
fLanguage
English
Journal_Title
Image Processing, IEEE Transactions on
Publisher
ieee
ISSN
1057-7149
Type
jour
DOI
10.1109/TIP.2007.900098
Filename
4271529
Link To Document