Title :
Keyword Matching in Historical Machine-Printed Documents Using Synthetic Data, Word Portions and Dynamic Time Warping
Author :
Konidaris, T. ; Gatos, B. ; Perantonis, S.J. ; Kesidis, A.
Author_Institution :
Comput. Intell. Lab., Nat. Center for Sci. Res. "Demokritos", Athens
Abstract :
In this paper we propose a novel and efficient technique for finding keywords typed by the user in digitised machine-printed historical documents using the dynamic time warping (DTW) algorithm. The method uses word portions located at the beginning and end of each segmented word of the processed documents and try to estimate the position of the first and last characters in order to reduce the list of candidate words. Since DTW can become computational intensive in large datasets the proposed method manages to significantly prune the list of candidate words thus, speeding up the entire process. Word length is also used as a means of further reducing the data to be processed. Results are improved in terms of time and efficiency compared to those produced if no pruning is done to the list of candidate words.
Keywords :
document handling; digitised machine-printed historical documents; dynamic time warping; historical machine-printed documents; keyword matching; synthetic data; word length; word portions; Character recognition; Computational intelligence; Histograms; Image segmentation; Informatics; Laboratories; Optical character recognition software; Optical feedback; Text analysis; Typesetting; Dynamic Time Warping; Historical Documents; Indexing;
Conference_Titel :
Document Analysis Systems, 2008. DAS '08. The Eighth IAPR International Workshop on
Conference_Location :
Nara
Print_ISBN :
978-0-7695-3337-7
DOI :
10.1109/DAS.2008.64