• DocumentCode
    591983
  • Title

    Evolution Maps for Connected Components in Text Documents

  • Author

    Biller, Ofer ; Kedem, Klara ; Dinstein, Itshak ; El-Sana, Jihad

  • Author_Institution
    Ben-Gurion Univ., Beer-Sheva, Israel
  • fYear
    2012
  • fDate
    18-20 Sept. 2012
  • Firstpage
    405
  • Lastpage
    410
  • Abstract
    For highly degraded text documents, common tasks such as binarization and line extraction, remain difficult tasks. Equipped with a reliable information regarding the distribution of character dimensions in the document, one can improve results of these algorithms significantly. We introduce a novel perspective of the image data which maps the evolution of connected components along the change in gray scale threshold. We use these maps to provide a robust algorithm for extracting information about character dimensions in degraded documents, and demonstrate improvement in binarization results using this information. We analyze statistically the characteristics of the evolution maps for text documents, and compare our results with ground truth data.
  • Keywords
    character recognition; document image processing; text analysis; binarization; character dimensions; connected components; evolution maps; gray scale threshold; ground truth data; image data; information extraction; line extraction; reliable information; robust algorithm; text document; Degradation; Educational institutions; Estimation; Histograms; Noise; Robustness; binarization; connected components analysis; degraded documents;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Frontiers in Handwriting Recognition (ICFHR), 2012 International Conference on
  • Conference_Location
    Bari
  • Print_ISBN
    978-1-4673-2262-1
  • Type

    conf

  • DOI
    10.1109/ICFHR.2012.201
  • Filename
    6424427