DocumentCode :
2010988
Title :
Binarization-Free Text Line Segmentation for Historical Documents Based on Interest Point Clustering
Author :
Garz, Angelika ; Fischer, Andreas ; Sablatnig, Robert ; Bunke, Horst
Author_Institution :
Comput. Vision Lab., Vienna Univ. of Technol., Vienna, Austria
fYear :
2012
fDate :
27-29 March 2012
Firstpage :
95
Lastpage :
99
Abstract :
Segmenting page images into text lines is a crucial pre-processing step for automated reading of historical documents. Challenging issues in this open research field are given eg by paper or parchment background noise, ink bleed-through, artifacts due to aging, stains, and touching text lines. In this paper, we present a novel binarization-free line segmentation method that is robust to noise and copes with overlapping and touching text lines. First, interest points representing parts of characters are extracted from gray-scale images. Next, word clusters are identified in high-density regions and touching components such as ascenders and descenders are separated using seam carving. Finally, text lines are generated by concatenating neighboring word clusters, where neighborhood is defined by the prevailing orientation of the words in the document. An experimental evaluation on the Latin manuscript images of the Saint Gall database shows promising results for real-world applications in terms of both accuracy and efficiency.
Keywords :
document image processing; image segmentation; pattern clustering; text analysis; automated reading; binarization free text line segmentation; gray-scale images; historical documents; interest point clustering; open research field; overlapping text lines; parchment background noise; touching text lines; Databases; Green products; Image segmentation; Layout; Merging; Noise; Robustness; ancient documents; binarization-free; handwritten; historical documents; manuscripts; text line segmentation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis Systems (DAS), 2012 10th IAPR International Workshop on
Conference_Location :
Gold Cost, QLD
Print_ISBN :
978-1-4673-0868-7
Type :
conf
DOI :
10.1109/DAS.2012.23
Filename :
6195342
Link To Document :
بازگشت