• DocumentCode
    1994909
  • Title

    Text identification in noisy document images using Markov random model

  • Author

    Zheng, Yefeng ; Li, Huiping ; Doermann, David

  • Author_Institution
    Inst. for Adv. Comput. Studies, Maryland Univ., College Park, MD, USA
  • fYear
    2003
  • fDate
    3-6 Aug. 2003
  • Firstpage
    599
  • Abstract
    In this paper we address the problem of the identification of text from noisy documents. We segment and identify handwriting from machine printed text because 1) handwriting in a document often indicates corrections, additions or other supplemental information that should be treated differently from the main body or body content, and 2) the segmentation and recognition techniques for machine printed text and handwriting are significantly different. Our novelty is that we treat noise as a separate class and model noise based on selected features. Trained Fisher classifiers are used to identify machine printed text and handwriting from noise. We further exploit context to refine the classification. A Markov random field (MRF) based approach is used to model the geometrical structure of the printed text, handwriting and noise to rectify the mis-classification. Experimental results show our approach is promising and robust, and can significantly improve the page segmentation results in noise documents.
  • Keywords
    Markov processes; document image processing; feature extraction; image classification; image segmentation; random processes; Fisher classifier; Markov random field; handwriting identification; handwriting segmentation; machine printed text; noisy document image; text identification; Degradation; Educational institutions; Filtering; Handwriting recognition; Histograms; Laboratories; Markov random fields; Random media; Text analysis; Text recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 2003. Proceedings. Seventh International Conference on
  • Print_ISBN
    0-7695-1960-1
  • Type

    conf

  • DOI
    10.1109/ICDAR.2003.1227734
  • Filename
    1227734