Title :
Extension Schemes for the Alignment Model of English-Malayalam Statistical Machine Translator
Author :
Sebastian, Mary Priya ; Kurian, K. Sheena ; Kumar, G. Santhosh
Author_Institution :
Dept. of Comput. Sci. & Eng., Rajagiri Sch. of Eng. & Technol., Kochi, India
Abstract :
In Statistical Machine Translation from English to Malayalam, an unseen English sentence is translated into its equivalent Malayalam sentence using statistical models. A parallel corpus of English-Malayalam is used in the training phase. Word to word alignments has to be set among the sentence pairs of the source and target language before subjecting them for training. This paper deals with certain techniques which can be adopted for improving the alignment model of SMT. Methods to incorporate the parts of speech information into the bilingual corpus has resulted in eliminating many of the insignificant alignments. Also identifying the name entities and cognates present in the sentence pairs has proved to be advantageous while setting up the alignments. Presence of Malayalam words with predictable translations has also contributed in reducing the insignificant alignments. Moreover, reduction of the unwanted alignments has brought in better training results. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics.
Keywords :
language translation; natural language processing; parallel processing; statistical analysis; BLEU; English sentence; English-Malayalam statistical machine translator; F measure; Malayalam sentence; SMT; WER evaluation metrics; bilingual corpus; extension schemes; parallel corpus; parts of speech information; source language; statistical models; target language; training phase; unwanted alignment reduction; word to word alignment model; Decoding; Hidden Markov models; Probability; Speech; Tagging; Training; Vectors; English Malayalam translation; alignment; machine translation; training;
Conference_Titel :
Advances in Computing and Communications (ICACC), 2012 International Conference on
Conference_Location :
Cochin, Kerala
Print_ISBN :
978-1-4673-1911-9
DOI :
10.1109/ICACC.2012.18