Title :
Character recognition of medieval English manuscripts supported by a word frequency table
Author :
Kei Tanaka;Kengo Terasawa
Author_Institution :
Department of Media Architecture, Future University Hakodate, 116-2 Kameda-Nakanocho, Hakodate, Hokkaido, 041-8655 Japan
Abstract :
This paper proposes a method to reduce the effort involved in making transcriptions of historical documents. The method consists of preprocessing, line and word segmentation, and word clustering stages. In the line segmentation process, we determine the borders around lines using dynamic programming to be able to avoid influence of letter ascenders and descenders. In the word clustering process, we propose a novel method, basically a hierarchical cluster analysis, which uses a word frequency table as supplementary information. The effectiveness of the proposed method is evaluated experimentally by comparing with a baseline method which does not use a word frequency table. The experiments confirmed that the proposed method outperforms the baseline method.
Keywords :
"Conferences","Pattern recognition"
Conference_Titel :
Pattern Recognition (ACPR), 2015 3rd IAPR Asian Conference on
Electronic_ISBN :
2327-0985
DOI :
10.1109/ACPR.2015.7486593