DocumentCode :
3708651
Title :
Development of Indonesian-Japanese statistical machine translation using lemma translation and additional post-process
Author :
Mohammad Anugrah Sulaeman;Ayu Purwarianti
Author_Institution :
School of Electrical Engineering and Informatics, Bandung Institute of Technology, Bandung, Indonesia
fYear :
2015
Firstpage :
54
Lastpage :
58
Abstract :
Despite the fact that study of statistical machine translation has been growing rapidly to date, there has not been much research done about Indonesian-Japanese statistical machine translation. The previous research about Indonesian-Japanese statistical machine translation has shown several problems in translation process, such as low coverage corpus data, unknown words, and sentence reordering problem. In this research, we propose two methods to address these problems. The proposed methods are lemma translation with generated surface form and additional post-process. Lemma translation uses lemma and POSTAG of word in its translation process. Rule based katakana translation and unknown word substitution are also used for additional post-process. Experimental data was collected from JLPT (Japanese Language Proficiency Test) Level 3 with total 1132 sentences. Experimental results using these methods showed an improvement over the baseline system with a 116% increased BLEU score on Japanese to Indonesian translation and 26% increased BLEU score on Indonesian to Japanese translation.
Keywords :
"Buildings","Decoding","Surface treatment","Data models","Electrical engineering","Informatics","Probability"
Publisher :
ieee
Conference_Titel :
Electrical Engineering and Informatics (ICEEI), 2015 International Conference on
Print_ISBN :
978-1-4673-6778-3
Electronic_ISBN :
2155-6830
Type :
conf
DOI :
10.1109/ICEEI.2015.7352469
Filename :
7352469
Link To Document :
بازگشت