• DocumentCode
    1606139
  • Title

    Statistical Machine Translation based on LDA

  • Author

    Zhengxian Gong ; Yu Zhang ; Guodong Zhou

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Soochow Univ., Suzhou, China
  • fYear
    2010
  • Firstpage
    286
  • Lastpage
    290
  • Abstract
    Current Statistical Machine Translation (SMT) systems translate one sentence at a time, ignoring any document level information. Consequently, translation models are learned only at sentence level and document contexts are generally overlooked. In this paper, we try to introduce document topic to help SMT system to produce target sentences. First, the parallel training corpus with underlying document boundary is segmented into multiple documents, and then we use a monolingual LDA model to determine which topics these documents belong to. Next, the background phrase table is enhanced with the probability distribution of a document over topics. Evaluation shows that our proposed approach significantly improves the BLEU score on Chinese-to-English machine translation.
  • Keywords
    document handling; language translation; natural language processing; statistical analysis; statistical distributions; BLEU score; Chinese to English machine translation; LDA; SMT system; background phrase table; document boundary; document contexts; monolingual LDA model; one sentence translation; parallel training corpus; probability distribution; statistical machine translation; Adaptation model; Biological system modeling; Conferences; Decoding; Hidden Markov models; NIST; Training; Adaptation; Document; LDA; SMT;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Universal Communication Symposium (IUCS), 2010 4th International
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4244-7821-7
  • Type

    conf

  • DOI
    10.1109/IUCS.2010.5666182
  • Filename
    5666182