• DocumentCode
    35587
  • Title

    Bilingual Continuous-Space Language Model Growing for Statistical Machine Translation

  • Author

    Rui Wang ; Hai Zhao ; Bao-Liang Lu ; Utiyama, Masao ; Sumita, Eiichiro

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Shanghai Jiao Tong Univ., Shanghai, China
  • Volume
    23
  • Issue
    7
  • fYear
    2015
  • fDate
    Jul-15
  • Firstpage
    1209
  • Lastpage
    1220
  • Abstract
    Larger n-gram language models (LMs) perform better in statistical machine translation (SMT). However, the existing approaches have two main drawbacks for constructing larger LMs: 1) it is not convenient to obtain larger corpora in the same domain as the bilingual parallel corpora in SMT; 2) most of the previous studies focus on monolingual information from the target corpora only, and redundant n-grams have not been fully utilized in SMT. Nowadays, continuous-space language model (CSLM), especially neural network language model (NNLM), has been shown great improvement in the estimation accuracies of the probabilities for predicting the target words. However, most of these CSLM and NNLM approaches still consider monolingual information only or require additional corpus. In this paper, we propose a novel neural network based bilingual LM growing method. Compared to the existing approaches, the proposed method enables us to use bilingual parallel corpus for LM growing in SMT. The results show that our new method outperforms the existing approaches on both SMT performance and computational efficiency significantly.
  • Keywords
    language translation; neural nets; CSLM; NNLM; SMT; bilingual LM growing method; bilingual continuous-space language model; bilingual parallel corpora; bilingual parallel corpus; monolingual information; neural network language model; statistical machine translation; Decoding; IEEE transactions; Joining processes; Probability; Speech; Speech processing; Training; Continuous-space language model; language model growing (LMG); neural network language model; statistical machine translation (SMT);
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    2329-9290
  • Type

    jour

  • DOI
    10.1109/TASLP.2015.2425220
  • Filename
    7090970