• DocumentCode
    3708651
  • Title

    Development of Indonesian-Japanese statistical machine translation using lemma translation and additional post-process

  • Author

    Mohammad Anugrah Sulaeman;Ayu Purwarianti

  • Author_Institution
    School of Electrical Engineering and Informatics, Bandung Institute of Technology, Bandung, Indonesia
  • fYear
    2015
  • Firstpage
    54
  • Lastpage
    58
  • Abstract
    Despite the fact that study of statistical machine translation has been growing rapidly to date, there has not been much research done about Indonesian-Japanese statistical machine translation. The previous research about Indonesian-Japanese statistical machine translation has shown several problems in translation process, such as low coverage corpus data, unknown words, and sentence reordering problem. In this research, we propose two methods to address these problems. The proposed methods are lemma translation with generated surface form and additional post-process. Lemma translation uses lemma and POSTAG of word in its translation process. Rule based katakana translation and unknown word substitution are also used for additional post-process. Experimental data was collected from JLPT (Japanese Language Proficiency Test) Level 3 with total 1132 sentences. Experimental results using these methods showed an improvement over the baseline system with a 116% increased BLEU score on Japanese to Indonesian translation and 26% increased BLEU score on Indonesian to Japanese translation.
  • Keywords
    "Buildings","Decoding","Surface treatment","Data models","Electrical engineering","Informatics","Probability"
  • Publisher
    ieee
  • Conference_Titel
    Electrical Engineering and Informatics (ICEEI), 2015 International Conference on
  • Print_ISBN
    978-1-4673-6778-3
  • Electronic_ISBN
    2155-6830
  • Type

    conf

  • DOI
    10.1109/ICEEI.2015.7352469
  • Filename
    7352469