Title :
The multistage approach to information extraction in degraded document images
Author :
Yan, Chen ; Leedham, Graham
Author_Institution :
Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore, Singapore
Abstract :
Global and local adaptive thresholding techniques have been shown effective on particular types of documents. None produces consistently good results on all types of documents. In this paper a novel method, called the multistage-approach, is presented and compared against some existing single-stage algorithms. The multistage approach recursively breaks down an image into sub-regions using quad-tree decomposition and extracts local features from each sub-region until an appropriate thresholding method can be applied to each sub-region. Quantitative analysis using word recall and on 300 degraded historical images obtained from the Library of Congress demonstrate the method is superior to any existing single methods.
Keywords :
document image processing; feature extraction; image segmentation; quadtrees; adaptive thresholding techniques; degraded document images; feature extraction; information extraction; multistage method; quadtree decomposition; quantitative analysis; single stage algorithms; Data mining; Degradation; Feature extraction; Histograms; Image analysis; Image storage; Libraries; Pixel; Storage automation; Text analysis;
Conference_Titel :
Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on
Print_ISBN :
0-7695-2128-2
DOI :
10.1109/ICPR.2004.1334154