• DocumentCode
    1389209
  • Title

    Integration of Statistical Models for Dictation of Document Translations in a Machine-Aided Human Translation Task

  • Author

    Reddy, Aarthi ; Rose, Richard C.

  • Author_Institution
    Dept. of Electr. & Comput. Eng., McGill Univ., Montréal, QC, Canada
  • Volume
    18
  • Issue
    8
  • fYear
    2010
  • Firstpage
    2015
  • Lastpage
    2027
  • Abstract
    This paper presents a model for machine-aided human translation (MAHT) that integrates source language text and target language acoustic information to produce the text translation of source language document. It is evaluated on a scenario where a human translator dictates a first draft target language translation of a source language document. Information obtained from the source language document, including translation probabilities derived from statistical machine translation (SMT) and named entity tags derived from named entity recognition (NER), is incorporated with acoustic phonetic information obtained from an automatic speech recognition (ASR) system. One advantage of the system combination used here is that words that are not included in the ASR vocabulary can be correctly decoded by the combined system. The MAHT model and system implementation is presented. It is shown that a relative decrease in word error rate of 29% can be obtained by this combined system relative to the baseline ASR performance on a French to English document translation task in the Hansard domain. In addition, it is shown that transcriptions obtained by using the combined system show a relative increase in NIST score of 34% compared to transcriptions obtained from the baseline ASR system.
  • Keywords
    language translation; speech recognition; statistical analysis; text analysis; ASR vocabulary; French to English document translation task; Hansard domain; MAHT model; acoustic phonetic information; automatic speech recognition system; document translations; machine-aided human translation task; named entity recognition; named entity tags; source language document; source language text integration; statistical machine translation; statistical model integration; target language acoustic information; text translation; translation probability; word error rate; Acoustic noise; Automatic speech recognition; Decoding; Error analysis; Humans; Natural languages; Probability; Speech recognition; Surface-mount technology; Vocabulary; Machine-aided human translation (MAHT); machine translation; named entity recognition; speech recognition;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2010.2040793
  • Filename
    5393062