Title of article

Statistical Machine Translation (SMT) for Highly-Inflectional Scarce-Resource Language

Author/Authors

Namdar، Saman نويسنده NLP Lab, School of ECE, , , Faili، Hesham نويسنده NLP Lab, School of ECE, , , Khadivi، Shahram نويسنده NLP Lab, Computer Engineering & IT Department ,

Issue Information

فصلنامه با شماره پیاپی 17 سال 2012

Pages

14

From page

39

To page

52

Abstract

Statistical Machine Translation (SMT) is a machine translation paradigm, in which translations are generated on the base of statistical models. In this system, parameters are derived from an analysis of a parallel corpus, and SMT quality depends on the ability of learning word translations. Enriching the SMT by a suitable morphology analyser decreases out of vocabulary words and dictionary size dramatically. This could be more considerable when it deals with a highly-inflectional, low-resource, language like Persian. Defining a suitable granularity for word segment may improve the alignment quality in the parallel corpus. In this paper different schemes and word’s combinations segments in a SMT’s experiment from Persian to English language are prospected and the best one-to-one alignment, which is called En-like scheme, is proposed. By using the mentioned scheme the translation’s quality from Persian to English is improved about 3 points with respect to BLEU measure over the phrase-based SMT.

Journal title

International Journal of Information and Communication Technology Research

Serial Year

2012

Journal title

International Journal of Information and Communication Technology Research

Record number

Statistical Machine Translation (SMT) for Highly-Inflectional Scarce-Resource Language

Namdar، Saman نويسنده NLP Lab, School of ECE, , , Faili، Hesham نويسنده NLP Lab, School of ECE, , , Khadivi، Shahram نويسنده NLP Lab, Computer Engineering & IT Department ,

831759