Title :
Text Extraction and Document Image Segmentation Using Matched Wavelets and MRF Model
Author :
Kumar, Sunil ; Gupta, Rajat ; Khanna, Nitin ; Chaudhury, Santanu ; Joshi, Shiv Dutt
Author_Institution :
IBM India Res. Lab., Delhi
Abstract :
In this paper, we have proposed a novel scheme for the extraction of textual areas of an image using globally matched wavelet filters. A clustering-based technique has been devised for estimating globally matched wavelet filters using a collection of groundtruth images. We have extended our text extraction scheme for the segmentation of document images into text, background, and picture components (which include graphics and continuous tone images). Multiple, two-class Fisher classifiers have been used for this purpose. We also exploit contextual information by using a Markov random field formulation-based pixel labeling scheme for refinement of the segmentation results. Experimental results have established effectiveness of our approach.
Keywords :
Markov processes; document image processing; feature extraction; filtering theory; image classification; image segmentation; pattern clustering; random processes; text analysis; wavelet transforms; MRF-based pixel labeling scheme; Markov random field formulation; background components; clustering-based technique; contextual information; document image segmentation; globally matched wavelet filters; groundtruth images; picture components; text extraction; two-class Fisher classifiers; Asia; Data mining; Discrete wavelet transforms; Graphics; Image color analysis; Image segmentation; Labeling; Layout; Markov random fields; Matched filters; $alpha $-expansion; Markov random field (MRF); document image; globally matched wavelets (GMWs); matched wavelets; scene image; Algorithms; Artificial Intelligence; Documentation; Image Enhancement; Image Interpretation, Computer-Assisted; Information Storage and Retrieval; Natural Language Processing; Pattern Recognition, Automated; Printing; Reproducibility of Results; Sensitivity and Specificity;
Journal_Title :
Image Processing, IEEE Transactions on
DOI :
10.1109/TIP.2007.900098