• DocumentCode
    1636624
  • Title

    Markov Random Field Based Text Identification from Annotated Machine Printed Documents

  • Author

    Peng, Xujun ; Setlur, Srirangaraj ; Govindaraju, Venu ; Sitaram, Ramachandrula ; Bhuvanagiri, Kiran

  • Author_Institution
    Dept. of Comput. Sci. & Eng., SUNY at Buffalo, Amherst, NY, USA
  • fYear
    2009
  • Firstpage
    431
  • Lastpage
    435
  • Abstract
    In this paper, we describe an approach to segment handwritten text, machine printed text and noise from annotated machine printed documents. Three categories of word level features are extracted. We use a modified K-Means clustering algorithm for classification followed by a relabeling procedure using Markov Random Field(MRF) based on a concept of neighboring patches and Belief Propagation(BP) rules. Experimental results on an imbalanced data set show that our approach achieves an overall recall of 96.33%.
  • Keywords
    Markov processes; document image processing; feature extraction; image classification; image segmentation; pattern clustering; random processes; text analysis; Markov random field; annotated machine printed document; belief propagation; feature extraction; k-mean clustering algorithm; machine printed text; segment handwritten text; text identification; Classification algorithms; Feature extraction; Gabor filters; Handwriting recognition; Hidden Markov models; Image segmentation; Markov random fields; Optical character recognition software; Text analysis; Text recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 2009. ICDAR '09. 10th International Conference on
  • Conference_Location
    Barcelona
  • ISSN
    1520-5363
  • Print_ISBN
    978-1-4244-4500-4
  • Electronic_ISBN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2009.237
  • Filename
    5277639