Title :
Ancient document analysis based on text line extraction
Author :
Kleber, Florian ; Sablatnig, Robert ; Gau, Melanie ; Miklas, Heinz
Author_Institution :
Inst. of Comput. Aided Autom., Vienna Univ. of Technol., Vienna
Abstract :
In order to preserve our cultural heritage and for automated document processing libraries and national archives have started digitizing historical documents. In the case of degraded manuscripts (e.g. by mold, humidity, bad storage conditions) the text or parts of it can disappear. The remaining parts of the text can be segmented and the ruling can be extrapolated with the a priori knowledge. Since the ruling defines the position of the text within a page, it can be used for layout analysis and as a basis for the enhancement of the readability. Furthermore, information about the scribe (hand) of the manuscript, its spatiotemporal origin can be gained by analyzing the ruling. This paper presents an algorithm for ruling estimation of Glagolitic texts based on text line extraction and is suitable for degraded manuscripts by extrapolating the baselines with the a priori knowledge of the ruling. The algorithm was tested on 30 pages of the Missale Sinaiticum and the evaluation was based on visual criteria.
Keywords :
document handling; Glagolitic texts; Missale Sinaiticum; ancient document analysis; automated document processing libraries; degraded manuscripts; historical documents; national archives; text line extraction; Algorithm design and analysis; Clustering algorithms; Data mining; Degradation; Image analysis; Image segmentation; Information analysis; Software libraries; Spatiotemporal phenomena; Text analysis;
Conference_Titel :
Pattern Recognition, 2008. ICPR 2008. 19th International Conference on
Conference_Location :
Tampa, FL
Print_ISBN :
978-1-4244-2174-9
Electronic_ISBN :
1051-4651
DOI :
10.1109/ICPR.2008.4761530