Title :
Semantics-based content extraction in typewritten historical documents
Author :
Antonacopoulos, A. ; Karatzas, D.
Author_Institution :
PRImA Lab., Salford Univ., UK
fDate :
29 Aug.-1 Sept. 2005
Abstract :
This paper presents a flexible approach to extracting content from scanned historical documents using semantic information. The final electronic document is the result of a "digital historical document lifecycle" process, where the expert knowledge of the historian/archivist user is incorporated at different stages. Results show that such a conversion strategy aided by (expert) user-specified semantic information and which enables the processing of individual parts of the document in a specialised way, produces superior (in a variety of significant ways) results than document analysis and understanding techniques devised for contemporary documents.
Keywords :
document handling; feature extraction; optical character recognition; contemporary document; document analysis; semantics-based content extraction; typewritten historical documents; user-specified semantic information; Aging; Data mining; Degradation; Image analysis; Image converters; Image recognition; Image segmentation; Information analysis; Text analysis; Writing;
Conference_Titel :
Document Analysis and Recognition, 2005. Proceedings. Eighth International Conference on
Print_ISBN :
0-7695-2420-6
DOI :
10.1109/ICDAR.2005.215