• DocumentCode
    3124217
  • Title

    Phrase-based data selection for language model adaptation in spoken language translation

  • Author

    Shixiang Lu ; Wei Wei ; Xiaoyin Fu ; Lichun Fan ; Bo Xu

  • Author_Institution
    Interactive Digital Media Technol. Res. Center, Inst. of Autom., Beijing, China
  • fYear
    2012
  • fDate
    5-8 Dec. 2012
  • Firstpage
    193
  • Lastpage
    196
  • Abstract
    In this paper, we propose an unsupervised phrase-based data selection model, address the problem of selecting no-domain-specific language model (LM) training data to build adapted LM for use. In spoken language translation (SLT) system, we aim at finding the LM training sentences which are similar to the translation task. Compared with the traditional bag-of-words models, the phrase-based data selection model is more effective because it captures contextual information in modeling the selection of phrase as a whole, rather than selection of single words in isolation. Large-scale experimental results demonstrate that our approach significantly outperforms the state-of-the-art approaches on both LM perplexity and translation performance, respectively.
  • Keywords
    language translation; speech processing; LM training sentence; SLT system; language model adaptation; no-domain-specific language model; spoken language translation system; unsupervised phrase-based data selection model; Adaptation models; Context modeling; Data models; Speech; Speech recognition; Training; Training data; contextual information; language model adaptation; phrase-based data selection; spoken language translation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Chinese Spoken Language Processing (ISCSLP), 2012 8th International Symposium on
  • Conference_Location
    Kowloon
  • Print_ISBN
    978-1-4673-2506-6
  • Electronic_ISBN
    978-1-4673-2505-9
  • Type

    conf

  • DOI
    10.1109/ISCSLP.2012.6423483
  • Filename
    6423483