• DocumentCode
    2307733
  • Title

    Text extraction from degraded document images

  • Author

    Hedjam, Rachid ; Moghaddam, Reza Farrahi ; Cheriet, Mohamed

  • Author_Institution
    Synchromedia Lab. for Multimedia Commun. in Telepresence, Ecole de Technol. Supeerieure, Montréal, QC, Canada
  • fYear
    2010
  • fDate
    5-6 July 2010
  • Firstpage
    247
  • Lastpage
    252
  • Abstract
    In this work, a robust segmentation method for text extraction from the historical document images is presented. The method is based on Markovian-Bayesian clustering on local graphs on both pixel and regional scales. It consists of three steps. In the first step, an over-segmented map of the input image is created. The resulting map provides a rich and accurate semi-mosaic fragments. The map is processed in the second step, similar and adjoining sub-regions are merged together to form accurate text shapes. The output of the second step, which contains accurate shapes, is processed in the final step in which, using clustering with fixed number of classes, the segmentation will be obtained. The method employs significantly the local and spatial correlation and coherence on both the image and between the stroke parts, and therefore is very robust with respect to the degradation. The resulting segmented text is smooth, and weak connections and loops are preserved thanks to robust nature of the method. The output can be used in succeeding skeletonization processes which require preservation of the text topology for achieving high performance. The method is tested on real degraded document images with promising results.
  • Keywords
    Bayes methods; Markov processes; document image processing; feature extraction; image segmentation; image thinning; text analysis; Markovian-Bayesian clustering; degraded document image; historical document image; robust segmentation method; skeletonization process; text extraction; text topology; Document image; Graph-partitioning; Image binarization; Image segmentation; MRF;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Visual Information Processing (EUVIP), 2010 2nd European Workshop on
  • Conference_Location
    Paris
  • Print_ISBN
    978-1-4244-7288-8
  • Type

    conf

  • DOI
    10.1109/EUVIP.2010.5699135
  • Filename
    5699135