DocumentCode
3488147
Title
Category-Based Language Models for Handwriting Recognition of Marriage License Books
Author
Romero, Veronica ; Andreu Sanchez, Joan
Author_Institution
Dept. de Sist. Informaticos y Comput., Univ. Polit`ecnica de Val`encia, Valencia, Spain
fYear
2013
fDate
25-28 Aug. 2013
Firstpage
788
Lastpage
792
Abstract
Handwritten marriage licenses books have been used for centuries by ecclesiastical institutions to register marriages. These documents have interesting information, useful for demography studies, organized in a list of individual marriage license records, such as an accounting book. The information in these books is usually collected by expert demographers that devote a lot of time to transcribe them. Despite the structure of the text, the automatic transcription and semantic information extraction of these documents is quite difficult due to the distinct and evolutionary vocabulary, which is composed mainly of proper names that change along the time. In this paper, we have defined some categories taking into account the semantic information included in the licenses. Then a category-based language model has been generated and integrated into the handwritten text recognition system. We study how the use of these categories can benefit not only the handwriting recognition step, but also the posterior semantic information extraction and knowledge discovery.
Keywords
data mining; document image processing; handwriting recognition; accounting book; category-based language models; demography studies; document transcription; ecclesiastical institutions; evolutionary vocabulary; handwriting recognition; handwritten text recognition system; knowledge discovery; marriage license books; marriage license records; semantic information extraction; Data mining; Databases; Handwriting recognition; Hidden Markov models; Licenses; Semantics; Training; Category-based language models; Handwriting marriage license books; Handwriting recognition; information extraction;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
Conference_Location
Washington, DC
ISSN
1520-5363
Type
conf
DOI
10.1109/ICDAR.2013.161
Filename
6628726
Link To Document