• DocumentCode
    1054335
  • Title

    Text Extraction and Document Image Segmentation Using Matched Wavelets and MRF Model

  • Author

    Kumar, Sunil ; Gupta, Rajat ; Khanna, Nitin ; Chaudhury, Santanu ; Joshi, Shiv Dutt

  • Author_Institution
    IBM India Res. Lab., Delhi
  • Volume
    16
  • Issue
    8
  • fYear
    2007
  • Firstpage
    2117
  • Lastpage
    2128
  • Abstract
    In this paper, we have proposed a novel scheme for the extraction of textual areas of an image using globally matched wavelet filters. A clustering-based technique has been devised for estimating globally matched wavelet filters using a collection of groundtruth images. We have extended our text extraction scheme for the segmentation of document images into text, background, and picture components (which include graphics and continuous tone images). Multiple, two-class Fisher classifiers have been used for this purpose. We also exploit contextual information by using a Markov random field formulation-based pixel labeling scheme for refinement of the segmentation results. Experimental results have established effectiveness of our approach.
  • Keywords
    Markov processes; document image processing; feature extraction; filtering theory; image classification; image segmentation; pattern clustering; random processes; text analysis; wavelet transforms; MRF-based pixel labeling scheme; Markov random field formulation; background components; clustering-based technique; contextual information; document image segmentation; globally matched wavelet filters; groundtruth images; picture components; text extraction; two-class Fisher classifiers; Asia; Data mining; Discrete wavelet transforms; Graphics; Image color analysis; Image segmentation; Labeling; Layout; Markov random fields; Matched filters; $alpha $-expansion; Markov random field (MRF); document image; globally matched wavelets (GMWs); matched wavelets; scene image; Algorithms; Artificial Intelligence; Documentation; Image Enhancement; Image Interpretation, Computer-Assisted; Information Storage and Retrieval; Natural Language Processing; Pattern Recognition, Automated; Printing; Reproducibility of Results; Sensitivity and Specificity;
  • fLanguage
    English
  • Journal_Title
    Image Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1057-7149
  • Type

    jour

  • DOI
    10.1109/TIP.2007.900098
  • Filename
    4271529