Title :
Partitioning of feature space by iterative classification for degraded document image binarisation
Author :
Valizadeh, M. ; Kabir, Ehsanollah
Author_Institution :
Dept. of Electr. & Comput. Eng., Tarbiat Modarres Univ., Tehran, Iran
fDate :
8/1/2012 12:00:00 AM
Abstract :
Proper partitioning of feature space into text and background regions is very important in document image binarisation. This study presents an iterative classification algorithm that efficiently partitions a two-dimensional feature space into text and background regions. It uses the result of Niblack´s binarisation algorithm as training data and employs its characteristics to define classification rules. In each iteration, it labels only some points of the feature space, which can be classified reliably and leaves the classification of other points to the next iterations. The classification result of a point in current iteration affects the classification of its neighbours in the next iterations and makes them more probable to be classified correctly. After a few iterations, it partitions the feature space into two regions associated with the text and background pixels. After partitioning, two global thresholding methods were used as an extra text class refinement to make the proposed algorithm robust against bleeding-through and shadow-through degradations. Finally, each pixel is labelled as either text or background according to its corresponding region in the feature space. The authors´ binarisation algorithm demonstrated superior performance against six well-known algorithms on three datasets. It is appropriate for various types of degraded images.
Keywords :
document image processing; feature extraction; iterative methods; pattern classification; Niblack binarisation algorithm; bleeding-through degradation; degraded document image binarisation; feature space partitioning; global thresholding methods; iterative classification; shadow-through degradation; text class refinement;
Journal_Title :
Image Processing, IET
DOI :
10.1049/iet-ipr.2011.0399