DocumentCode :
2220098
Title :
Separating text and background in degraded document images - a comparison of global thresholding techniques for multi-stage thresholding
Author :
Leedham, Graham ; Varma, Saket ; Patankar, Anish ; Govindaraju, Venu
Author_Institution :
Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore
fYear :
2002
fDate :
2002
Firstpage :
244
Lastpage :
249
Abstract :
Before any processing of the textual content of a document image can be performed the text must be separated from the background of the image. Several thresholding algorithms have previously been proposed and are widely used in document processing. None have been shown effective at thresholding difficult documents where the background and foreground are non-uniform. In this paper we investigate the use of three global thresholding algorithms (Otsu´s, Kapur´s entropy and Solihin´s quadratic integral ratio (QIR)) as the first stage in a multi-stage thresholding algorithm for use in degraded document images. It is concluded that Otsu´s and Kapur´s algorithms do not work well for difficult documents as they tend to over-threshold the image, thus losing much of the useful information. The QIR algorithm is more accurate in separating the foreground and background in these images, leaving a range of undecided, fuzzy, pixels for later processing in a subsequent stage.
Keywords :
document image processing; image segmentation; background; degraded document images; document image; document processing; foreground; global thresholding; multi-stage thresholding; quadratic integral ratio; Degradation; Entropy; Image analysis; Image recognition; Performance analysis; Pixel; Text analysis; Text recognition; Venus; Writing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Frontiers in Handwriting Recognition, 2002. Proceedings. Eighth International Workshop on
Print_ISBN :
0-7695-1692-0
Type :
conf
DOI :
10.1109/IWFHR.2002.1030917
Filename :
1030917
Link To Document :
بازگشت