Title of article :
Statistical Machine Translation (SMT) for Highly-Inflectional Scarce-Resource Language
Author/Authors :
Namdar، Saman نويسنده NLP Lab, School of ECE, , , Faili، Hesham نويسنده NLP Lab, School of ECE, , , Khadivi، Shahram نويسنده NLP Lab, Computer Engineering & IT Department ,
Issue Information :
فصلنامه با شماره پیاپی 17 سال 2012
Abstract :
Statistical Machine Translation (SMT) is a machine translation paradigm, in which translations are
generated on the base of statistical models. In this system, parameters are derived from an analysis of a parallel
corpus, and SMT quality depends on the ability of learning word translations. Enriching the SMT by a suitable
morphology analyser decreases out of vocabulary words and dictionary size dramatically. This could be more
considerable when it deals with a highly-inflectional, low-resource, language like Persian. Defining a suitable
granularity for word segment may improve the alignment quality in the parallel corpus. In this paper different
schemes and word’s combinations segments in a SMT’s experiment from Persian to English language are prospected
and the best one-to-one alignment, which is called En-like scheme, is proposed. By using the mentioned scheme the
translation’s quality from Persian to English is improved about 3 points with respect to BLEU measure over the
phrase-based SMT.
Journal title :
International Journal of Information and Communication Technology Research
Journal title :
International Journal of Information and Communication Technology Research