Title :
Experimental comparisons of binarization and multi-thresholding methods on document images
Author :
O´Gorman, Lawrence
Author_Institution :
AT&T Bell Labs., Murray Hill, NJ, USA
Abstract :
Thresholding methods are applied here to document images and their experimental results compared. In one set of tests, different thresholding methods are used to binarize document images, then optical character recognition (OCR) is performed on the resulting text and the recognition results are compared. In the other set of tests, multi-thresholding is performed on document images-to obtain three or more levels for images with more than binary levels-and the results are compared. Four thresholding methods are compared in the experiments: a discriminant analysis method, a maximum entropy method, a moment-preserving method, and a connectivity-preserving method. A method using a minimum-error criterion is also commented upon. The moment-preserving and connectivity-preserving methods are found to yield the best OCR results from the binarized images, and the connectivity-preserving method yields the fewest binarization and multi-thresholding failures
Keywords :
document handling; OCR; binarization; connectivity-preserving method; discriminant analysis method; document images; maximum entropy method; minimum-error criterion; moment-preserving method; multi-thresholding methods; optical character recognition; Character recognition; Entropy; Gray-scale; Image processing; Image recognition; Optical character recognition software; Performance evaluation; Printing; Testing; Text recognition;
Conference_Titel :
Pattern Recognition, 1994. Vol. 2 - Conference B: Computer Vision & Image Processing., Proceedings of the 12th IAPR International. Conference on
Conference_Location :
Jerusalem
Print_ISBN :
0-8186-6270-0
DOI :
10.1109/ICPR.1994.576954