Title :
Robust Document Image Binarization Technique for Degraded Document Images
Author :
Bolan Su ; Shijian Lu ; Chew Lim Tan
Author_Institution :
Sch. of Comput., Nat. Univ. of Singapore, Singapore, Singapore
Abstract :
Segmentation of text from badly degraded document images is a very challenging task due to the high inter/intra-variation between the document background and the foreground text of different document images. In this paper, we propose a novel document image binarization technique that addresses these issues by using adaptive image contrast. The adaptive image contrast is a combination of the local image contrast and the local image gradient that is tolerant to text and background variation caused by different types of document degradations. In the proposed technique, an adaptive contrast map is first constructed for an input degraded document image. The contrast map is then binarized and combined with Canny´s edge map to identify the text stroke edge pixels. The document text is further segmented by a local threshold that is estimated based on the intensities of detected text stroke edge pixels within a local window. The proposed method is simple, robust, and involves minimum parameter tuning. It has been tested on three public datasets that are used in the recent document image binarization contest (DIBCO) 2009 & 2011 and handwritten-DIBCO 2010 and achieves accuracies of 93.5%, 87.8%, and 92.03%, respectively, that are significantly higher than or close to that of the best-performing methods reported in the three contests. Experiments on the Bickley diary dataset that consists of several challenging bad quality document images also show the superior performance of our proposed method, compared with other techniques.
Keywords :
document image processing; image segmentation; Bickley diary dataset; Canny edge map; DIBCO; adaptive contrast map; adaptive image contrast; background variation; degraded document image segmentation; document image binarization technique; high inter-intravariation; local image contrast; local image gradient; local threshold; local window; minimum parameter tuning; text stroke edge pixels; text variation; Degradation; Equations; Histograms; Image edge detection; Image segmentation; Mathematical model; Robustness; Adaptive image contrast; degraded document image binarization; document analysis; document image processing; pixel classification;
Journal_Title :
Image Processing, IEEE Transactions on
DOI :
10.1109/TIP.2012.2231089