• DocumentCode
    2454288
  • Title

    Binarization of low quality text using a Markov random field model

  • Author

    Wolf, Christian ; Doermann, David

  • Author_Institution
    Lab. Reconnaissance de Formes et Vision, Inst. Nat. des Sci. Appliquees de Lyon, Villeurbanne, France
  • Volume
    3
  • fYear
    2002
  • fDate
    2002
  • Firstpage
    160
  • Abstract
    Binarization techniques have been developed in the document analysis community for over 30 years and many algorithms have been used successfully. On the other hand, document analysis tasks are more and more frequently being applied to multimedia documents such as video sequences. Due to low resolution and lossy compression, the binarization of text included in the frames is a non-trivial task. Existing techniques work without a model of the spatial relationships in the image, which makes them less powerful. We introduce a new technique based on a Markov random field model of the document. The model parameters (clique potentials) are learned from training data and the binary image is estimated in a Bayesian framework. The performance is evaluated using commercial OCR software.
  • Keywords
    Bayes methods; Markov processes; document image processing; multimedia computing; probability; simulated annealing; Bayesian method; Gibbs distributions; Markov random field; document analysis; low quality text binarization; multimedia documents; optimization; probability; simulated annealing; Algorithm design and analysis; Bayesian methods; Image coding; Image sequence analysis; Markov random fields; Optical character recognition software; Spatial resolution; Text analysis; Training data; Video sequences;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition, 2002. Proceedings. 16th International Conference on
  • ISSN
    1051-4651
  • Print_ISBN
    0-7695-1695-X
  • Type

    conf

  • DOI
    10.1109/ICPR.2002.1047819
  • Filename
    1047819