DocumentCode
183435
Title
Semiautomatic Text Baseline Detection in Large Historical Handwritten Documents
Author
Bosch, Vicente ; Toselli, Alejandro Hector ; Vidal, Enrique
Author_Institution
PRHLT Res. Center, Univ. Politec. Valencia, Valencia, Spain
fYear
2014
fDate
1-4 Sept. 2014
Firstpage
690
Lastpage
695
Abstract
A semiautomatic iterative process for the detection of text baselines in historical handwritten document images is presented. It relies on the use of Hidden Markov Models (HMM) to provide initial text baselines hypotheses, followed by user review in order to produce ground-truth quality results. Using the set of revised baselines as ground truth, the HMM´s are re-trained before processing the next batch of pages. This process has been evaluated in the context of a real transcription task which, as a by-product, has produced line-detection ground truth. We show that the usage of a formal, HMM-based line-detection approach which requires training data, not only yields good detection results but is also of practical use in large handwritten image collections. Through experiments with real users we show that the proposed approach has interesting features, namely, accuracy, scalability and ease of use, as well as low overall human effort requirements.
Keywords
document image processing; handwritten character recognition; hidden Markov models; text detection; HMM-based line-detection approach; hidden Markov model; historical handwritten document image; semiautomatic iterative process; semiautomatic text baseline detection; training data; Accuracy; Feature extraction; Hidden Markov models; Image segmentation; Layout; Training; Vectors; baseline detection; ground truth creation; process;
fLanguage
English
Publisher
ieee
Conference_Titel
Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on
Conference_Location
Heraklion
ISSN
2167-6445
Print_ISBN
978-1-4799-4335-7
Type
conf
DOI
10.1109/ICFHR.2014.121
Filename
6981100
Link To Document