• DocumentCode
    3488147
  • Title

    Category-Based Language Models for Handwriting Recognition of Marriage License Books

  • Author

    Romero, Veronica ; Andreu Sanchez, Joan

  • Author_Institution
    Dept. de Sist. Informaticos y Comput., Univ. Polit`ecnica de Val`encia, Valencia, Spain
  • fYear
    2013
  • fDate
    25-28 Aug. 2013
  • Firstpage
    788
  • Lastpage
    792
  • Abstract
    Handwritten marriage licenses books have been used for centuries by ecclesiastical institutions to register marriages. These documents have interesting information, useful for demography studies, organized in a list of individual marriage license records, such as an accounting book. The information in these books is usually collected by expert demographers that devote a lot of time to transcribe them. Despite the structure of the text, the automatic transcription and semantic information extraction of these documents is quite difficult due to the distinct and evolutionary vocabulary, which is composed mainly of proper names that change along the time. In this paper, we have defined some categories taking into account the semantic information included in the licenses. Then a category-based language model has been generated and integrated into the handwritten text recognition system. We study how the use of these categories can benefit not only the handwriting recognition step, but also the posterior semantic information extraction and knowledge discovery.
  • Keywords
    data mining; document image processing; handwriting recognition; accounting book; category-based language models; demography studies; document transcription; ecclesiastical institutions; evolutionary vocabulary; handwriting recognition; handwritten text recognition system; knowledge discovery; marriage license books; marriage license records; semantic information extraction; Data mining; Databases; Handwriting recognition; Hidden Markov models; Licenses; Semantics; Training; Category-based language models; Handwriting marriage license books; Handwriting recognition; information extraction;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
  • Conference_Location
    Washington, DC
  • ISSN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2013.161
  • Filename
    6628726