• DocumentCode
    2502766
  • Title

    A Self-Training Learning Document Binarization Framework

  • Author

    Bolan Su ; Shijian Lu ; Tan, Chew Lim

  • Author_Institution
    Sch. of Comput., Nat. Univ. of Singapore, Singapore, Singapore
  • fYear
    2010
  • fDate
    23-26 Aug. 2010
  • Firstpage
    3187
  • Lastpage
    3190
  • Abstract
    Document Image Binarization techniques have been studied for many years, and many practical binarization techniques have been developed and applied successfully on commercial document analysis systems. However, the current state-of-the-art methods, fail to produce good binarization results for many badly degraded document images. In this paper, we propose a self-training learning framework for document image binarization. Based on reported binarization methods, the proposed framework first divides document image pixels into three categories, namely, foreground pixels, background pixels and uncertain pixels. A classifier is then trained by learning from the document image pixels in the foreground and background categories. Finally, the uncertain pixels are classified using the learned pixel classifier. Extensive experiments have been conducted over the dataset that is used in the recent Document Image Binarization Contest (DIBCO) 2009. Experimental results show that our proposed framework significantly improves the performance of reported document image binarization methods.
  • Keywords
    document image processing; learning (artificial intelligence); pattern classification; document image binarization contest; learned pixel classifier; self training learning document binarization; Histograms; Lighting; Pixel; Testing; Text analysis; document image binarization; image pixel classification; self-training learning framework;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition (ICPR), 2010 20th International Conference on
  • Conference_Location
    Istanbul
  • ISSN
    1051-4651
  • Print_ISBN
    978-1-4244-7542-1
  • Type

    conf

  • DOI
    10.1109/ICPR.2010.780
  • Filename
    5597185