DocumentCode
3021020
Title
A comparison of binarization methods for historical archive documents
Author
He, J. ; Do, Q.D.M. ; Downton, A.C. ; Kim, J.H.
Author_Institution
Dept. of Electron. Syst. Eng., Essex Univ., Colchester, UK
fYear
2005
fDate
29 Aug.-1 Sept. 2005
Firstpage
538
Abstract
This paper compares several alternative binarization algorithms for historical archive documents, by evaluating their effect on end-to-end word recognition performance in a complete archive document recognition system utilising a commercial OCR engine. The algorithms evaluated are: global thresholding; Niblack´s and Sauvola´s algorithms; adaptive versions of Niblack´s and Sauvola´s algorithms; and Niblack´s and Sauvola´s algorithms applied to background removed images. We found that, for our archive documents, Niblack´s algorithm can achieve better performance than Sauvola´s (which has been claimed as an evolution of Niblack´s algorithm), and that it also achieved better performance than the internal binarization provided as part of the commercial OCR engine.
Keywords
character recognition; document image processing; history; word processing; Niblack algorithm; Sauvola algorithm; binarization methods; commercial OCR engine; end-to-end word recognition; global thresholding; historical archive documents; Clustering algorithms; Engines; Image color analysis; Image converters; Image recognition; Image segmentation; Optical character recognition software; Pixel; Pursuit algorithms; Text analysis;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition, 2005. Proceedings. Eighth International Conference on
ISSN
1520-5263
Print_ISBN
0-7695-2420-6
Type
conf
DOI
10.1109/ICDAR.2005.3
Filename
1575603
Link To Document