Title of article
Statistical Machine Translation (SMT) for Highly-Inflectional Scarce-Resource Language
Author/Authors
Namdar، Saman نويسنده NLP Lab, School of ECE, , , Faili، Hesham نويسنده NLP Lab, School of ECE, , , Khadivi، Shahram نويسنده NLP Lab, Computer Engineering & IT Department ,
Issue Information
فصلنامه با شماره پیاپی 17 سال 2012
Pages
14
From page
39
To page
52
Abstract
Statistical Machine Translation (SMT) is a machine translation paradigm, in which translations are
generated on the base of statistical models. In this system, parameters are derived from an analysis of a parallel
corpus, and SMT quality depends on the ability of learning word translations. Enriching the SMT by a suitable
morphology analyser decreases out of vocabulary words and dictionary size dramatically. This could be more
considerable when it deals with a highly-inflectional, low-resource, language like Persian. Defining a suitable
granularity for word segment may improve the alignment quality in the parallel corpus. In this paper different
schemes and word’s combinations segments in a SMT’s experiment from Persian to English language are prospected
and the best one-to-one alignment, which is called En-like scheme, is proposed. By using the mentioned scheme the
translation’s quality from Persian to English is improved about 3 points with respect to BLEU measure over the
phrase-based SMT.
Journal title
International Journal of Information and Communication Technology Research
Serial Year
2012
Journal title
International Journal of Information and Communication Technology Research
Record number
831759
Link To Document