• DocumentCode
    1196753
  • Title

    Decompose algorithm for thresholding degraded historical document images

  • Author

    Chen, Y. ; Leedham, G.

  • Author_Institution
    Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore
  • Volume
    152
  • Issue
    6
  • fYear
    2005
  • Firstpage
    702
  • Lastpage
    714
  • Abstract
    Numerous techniques have previously been proposed for single-stage thresholding of document images to separate the written or printed information from the background. A new thresholding structure called the decompose algorithm is proposed and compared against some existing single-stage algorithms. The decompose algorithm uses local feature vectors to analyse and find the best approach to threshold a local area. Instead of employing a single thresholding algorithm, automatic selection of an appropriate algorithm for specific types of subregions of the document is performed. The original image is recursively broken down into subregions using quad-tree decomposition until a suitable thresholding method can be applied to each subregion. The algorithm has been trained using 300 historical images and evaluated on 300 ´difficult´ document images, in which considerable background noise or variation in contrast and illumination exists. Quantitative analysis of the results by measuring text recall, and qualitative assessment of processed document image quality is reported. The decompose algorithm is demonstrated to be effective at resolving the problem in varying quality historical images.
  • Keywords
    feature extraction; image classification; image enhancement; quadtrees; degraded historical document image thresholding; image decompose algorithm; image quality; image subregion recursive breakdown; local feature vectors; local region classification; printed information separation; quad-tree decomposition; text recall; written information separation;
  • fLanguage
    English
  • Journal_Title
    Vision, Image and Signal Processing, IEE Proceedings -
  • Publisher
    iet
  • ISSN
    1350-245X
  • Type

    jour

  • DOI
    10.1049/ip-vis:20045054
  • Filename
    1520854