• DocumentCode
    591974
  • Title

    Statistical Machine Translation as a Language Model for Handwriting Recognition

  • Author

    Devlin, John ; Kamali, M. ; Subramanian, Kartick ; Prasad, Ranga ; Natarajan, Prem

  • Author_Institution
    Raytheon BBN Technol., Cambridge, MA, USA
  • fYear
    2012
  • fDate
    18-20 Sept. 2012
  • Firstpage
    291
  • Lastpage
    296
  • Abstract
    When performing handwriting recognition on natural language text, the use of a word-level language model (LM) is known to significantly improve recognition accuracy. The most common type of language model, the n-gram model, decomposes sentences into short, overlapping chunks. In this paper, we propose a new type of language model which we use in addition to the standard n-gram LM. Our new model uses the likelihood score from a statistical machine translation system as a reranking feature. In general terms, we automatically translate each OCR hypothesis into another language, and then create a feature score based on how "difficult" it was to perform the translation. Intuitively, the difficulty of translation correlates with how well-formed the input sentence is. In an Arabic handwriting recognition task, we were able to obtain an 0.4% absolute improvement to word error rate (WER) on top of a powerful 5-gram LM.
  • Keywords
    handwriting recognition; language translation; natural language processing; Arabic handwriting recognition task; OCR hypothesis; feature score; likelihood score; natural language text; overlapping chunks; reranking feature; standard n-gram model; statistical machine translation system; word error rate; word level language model; Buildings; Computational modeling; Handwriting recognition; Hidden Markov models; Optical character recognition software; Training; Viterbi algorithm; handwriting recognition; machine translation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Frontiers in Handwriting Recognition (ICFHR), 2012 International Conference on
  • Conference_Location
    Bari
  • Print_ISBN
    978-1-4673-2262-1
  • Type

    conf

  • DOI
    10.1109/ICFHR.2012.273
  • Filename
    6424408