• DocumentCode
    527301
  • Title

    An improvement of translation quality with adding key-words in parallel corpus

  • Author

    Tian, Liang ; Wong, Fai ; Chao, Sam

  • Author_Institution
    Dept. of Comput. & Inf. Sci., Univ. of Macau, Macau, China
  • Volume
    3
  • fYear
    2010
  • fDate
    11-14 July 2010
  • Firstpage
    1273
  • Lastpage
    1278
  • Abstract
    In this paper, we propose a new approach to improve the translation quality by adding the Key-Words of a sentence to the parallel corpus. The main idea of the approach is to find the key-words of sentences that cannot be properly translated by the model, and then put it or them in the training corpus in a separated line as a sentence. During our experiment, we use two statistical machine translation (SMT) systems, word-based SMT (ISI-rewrite) and phrase-based SMT (Moses), and a small parallel corpus (4,000 sentences) to check our assumption. To our glad, we get a better BLEU score than the original parallel text. It can improve about 6% in word-based SMT (isi-rewrite) and 4% in phrased-based SMT (Moses). At last we build a 120,000 English-Chinese parallel corpus in this way.
  • Keywords
    language translation; natural language processing; statistical analysis; BLEU; English Chinese parallel corpus; ISI-rewrite; Moses; key words; phrase based SMT; statistical machine translation; training corpus; translation quality; word based SMT; Buildings; Computational modeling; Cybernetics; Decoding; Machine learning; Probability; Training; Parallel corpus; Statistical machine translation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Cybernetics (ICMLC), 2010 International Conference on
  • Conference_Location
    Qingdao
  • Print_ISBN
    978-1-4244-6526-2
  • Type

    conf

  • DOI
    10.1109/ICMLC.2010.5580888
  • Filename
    5580888