• DocumentCode
    3021020
  • Title

    A comparison of binarization methods for historical archive documents

  • Author

    He, J. ; Do, Q.D.M. ; Downton, A.C. ; Kim, J.H.

  • Author_Institution
    Dept. of Electron. Syst. Eng., Essex Univ., Colchester, UK
  • fYear
    2005
  • fDate
    29 Aug.-1 Sept. 2005
  • Firstpage
    538
  • Abstract
    This paper compares several alternative binarization algorithms for historical archive documents, by evaluating their effect on end-to-end word recognition performance in a complete archive document recognition system utilising a commercial OCR engine. The algorithms evaluated are: global thresholding; Niblack´s and Sauvola´s algorithms; adaptive versions of Niblack´s and Sauvola´s algorithms; and Niblack´s and Sauvola´s algorithms applied to background removed images. We found that, for our archive documents, Niblack´s algorithm can achieve better performance than Sauvola´s (which has been claimed as an evolution of Niblack´s algorithm), and that it also achieved better performance than the internal binarization provided as part of the commercial OCR engine.
  • Keywords
    character recognition; document image processing; history; word processing; Niblack algorithm; Sauvola algorithm; binarization methods; commercial OCR engine; end-to-end word recognition; global thresholding; historical archive documents; Clustering algorithms; Engines; Image color analysis; Image converters; Image recognition; Image segmentation; Optical character recognition software; Pixel; Pursuit algorithms; Text analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 2005. Proceedings. Eighth International Conference on
  • ISSN
    1520-5263
  • Print_ISBN
    0-7695-2420-6
  • Type

    conf

  • DOI
    10.1109/ICDAR.2005.3
  • Filename
    1575603