• DocumentCode
    1635391
  • Title

    Clutter Noise Removal in Binary Document Images

  • Author

    Agrawal, Mudit ; Doermann, David

  • Author_Institution
    Inst. of Adv. Comput. Studies, Univ. of Maryland, College Park, MD, USA
  • fYear
    2009
  • Firstpage
    556
  • Lastpage
    560
  • Abstract
    The paper presents a clutter detection and removal algorithm for complex document images. The distance transform based approach is independent of clutter´s position, size, shape and connectivity with text. Features are based on a residual image obtained by analysis of the distance transform and clutter elements, if present, are identified with an SVM classifier. Removal is restrictive, so text attached to the clutter is not deleted in the process. The method was tested on a collection of degraded and noisy, machine-printed and handwritten Arabic and English text documents. Results show pixel-level accuracies of 97.5% and 95% for clutter detection and removal respectively. This approach was also extended with a noise detection and removal model for documents having a mix of clutter and salt-n-pepper noise.
  • Keywords
    document image processing; image classification; image denoising; support vector machines; text analysis; SVM classifier; binary document image; clutter noise removal; distance transform; noise detection; text analysis; Algorithm design and analysis; Background noise; Educational institutions; Image analysis; Image recognition; Ink; Multi-stage noise shaping; Noise shaping; Shape; Text analysis; clutter; ink blobs; marginal noise; noise removal;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 2009. ICDAR '09. 10th International Conference on
  • Conference_Location
    Barcelona
  • ISSN
    1520-5363
  • Print_ISBN
    978-1-4244-4500-4
  • Electronic_ISBN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2009.277
  • Filename
    5277594