DocumentCode
591983
Title
Evolution Maps for Connected Components in Text Documents
Author
Biller, Ofer ; Kedem, Klara ; Dinstein, Itshak ; El-Sana, Jihad
Author_Institution
Ben-Gurion Univ., Beer-Sheva, Israel
fYear
2012
fDate
18-20 Sept. 2012
Firstpage
405
Lastpage
410
Abstract
For highly degraded text documents, common tasks such as binarization and line extraction, remain difficult tasks. Equipped with a reliable information regarding the distribution of character dimensions in the document, one can improve results of these algorithms significantly. We introduce a novel perspective of the image data which maps the evolution of connected components along the change in gray scale threshold. We use these maps to provide a robust algorithm for extracting information about character dimensions in degraded documents, and demonstrate improvement in binarization results using this information. We analyze statistically the characteristics of the evolution maps for text documents, and compare our results with ground truth data.
Keywords
character recognition; document image processing; text analysis; binarization; character dimensions; connected components; evolution maps; gray scale threshold; ground truth data; image data; information extraction; line extraction; reliable information; robust algorithm; text document; Degradation; Educational institutions; Estimation; Histograms; Noise; Robustness; binarization; connected components analysis; degraded documents;
fLanguage
English
Publisher
ieee
Conference_Titel
Frontiers in Handwriting Recognition (ICFHR), 2012 International Conference on
Conference_Location
Bari
Print_ISBN
978-1-4673-2262-1
Type
conf
DOI
10.1109/ICFHR.2012.201
Filename
6424427
Link To Document