DocumentCode :
3488147
Title :
Category-Based Language Models for Handwriting Recognition of Marriage License Books
Author :
Romero, Veronica ; Andreu Sanchez, Joan
Author_Institution :
Dept. de Sist. Informaticos y Comput., Univ. Polit`ecnica de Val`encia, Valencia, Spain
fYear :
2013
fDate :
25-28 Aug. 2013
Firstpage :
788
Lastpage :
792
Abstract :
Handwritten marriage licenses books have been used for centuries by ecclesiastical institutions to register marriages. These documents have interesting information, useful for demography studies, organized in a list of individual marriage license records, such as an accounting book. The information in these books is usually collected by expert demographers that devote a lot of time to transcribe them. Despite the structure of the text, the automatic transcription and semantic information extraction of these documents is quite difficult due to the distinct and evolutionary vocabulary, which is composed mainly of proper names that change along the time. In this paper, we have defined some categories taking into account the semantic information included in the licenses. Then a category-based language model has been generated and integrated into the handwritten text recognition system. We study how the use of these categories can benefit not only the handwriting recognition step, but also the posterior semantic information extraction and knowledge discovery.
Keywords :
data mining; document image processing; handwriting recognition; accounting book; category-based language models; demography studies; document transcription; ecclesiastical institutions; evolutionary vocabulary; handwriting recognition; handwritten text recognition system; knowledge discovery; marriage license books; marriage license records; semantic information extraction; Data mining; Databases; Handwriting recognition; Hidden Markov models; Licenses; Semantics; Training; Category-based language models; Handwriting marriage license books; Handwriting recognition; information extraction;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
Conference_Location :
Washington, DC
ISSN :
1520-5363
Type :
conf
DOI :
10.1109/ICDAR.2013.161
Filename :
6628726
Link To Document :
بازگشت