• DocumentCode
    1638929
  • Title

    Language Model Integration for the Recognition of Handwritten Medieval Documents

  • Author

    Wuthrich, Manuel ; Liwicki, Marcus ; Fischer, Andreas ; Indermuhle, E. ; Bunke, Horst ; Viehhauser, Gabriel ; Stolz, Michael

  • Author_Institution
    Inst. of Comput. Sci. & Appl. Math., Univ. of Bern, Bern, Switzerland
  • fYear
    2009
  • Firstpage
    211
  • Lastpage
    215
  • Abstract
    Building recognition systems for historical documents is a difficult task. Especially, when it comes to medieval scripts. The complexity is mainly affected by the poor quality and the small quantity of the data available. In this paper we apply an HMM based recognition system to medieval manuscripts from the 13th century written in Middle High German. The recognition system, which was originally developed for modern scripts, has been adapted to medieval scripts. Beside the data processing, one of the major challenges is to create a suitable language model. Because of the lack of appropriate independent text corpora for medieval languages, the language model has to be created on the base of a rather small number of manuscripts only. Due to the small size of the corpus, optimizing the language model parameters can quickly lead to the problem of overfitting. In this paper we describe a strategy to integrate all available information into the language model and to optimize the language model parameters without suffering from this problem.
  • Keywords
    handwriting recognition; handwritten character recognition; hidden Markov models; history; text analysis; HMM based recognition system; handwritten medieval documents recognition; historical documents recognition systems; independent text corpora; language model integration; language model parameters; medieval languages; medieval manuscripts; Computer science; Data processing; Digital images; Handwriting recognition; Hidden Markov models; Mathematics; Natural languages; Software libraries; Text analysis; Writing; HMM; Historical Documents; Language Model; Overfitting;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 2009. ICDAR '09. 10th International Conference on
  • Conference_Location
    Barcelona
  • ISSN
    1520-5363
  • Print_ISBN
    978-1-4244-4500-4
  • Electronic_ISBN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2009.17
  • Filename
    5277727