DocumentCode :
527301
Title :
An improvement of translation quality with adding key-words in parallel corpus
Author :
Tian, Liang ; Wong, Fai ; Chao, Sam
Author_Institution :
Dept. of Comput. & Inf. Sci., Univ. of Macau, Macau, China
Volume :
3
fYear :
2010
fDate :
11-14 July 2010
Firstpage :
1273
Lastpage :
1278
Abstract :
In this paper, we propose a new approach to improve the translation quality by adding the Key-Words of a sentence to the parallel corpus. The main idea of the approach is to find the key-words of sentences that cannot be properly translated by the model, and then put it or them in the training corpus in a separated line as a sentence. During our experiment, we use two statistical machine translation (SMT) systems, word-based SMT (ISI-rewrite) and phrase-based SMT (Moses), and a small parallel corpus (4,000 sentences) to check our assumption. To our glad, we get a better BLEU score than the original parallel text. It can improve about 6% in word-based SMT (isi-rewrite) and 4% in phrased-based SMT (Moses). At last we build a 120,000 English-Chinese parallel corpus in this way.
Keywords :
language translation; natural language processing; statistical analysis; BLEU; English Chinese parallel corpus; ISI-rewrite; Moses; key words; phrase based SMT; statistical machine translation; training corpus; translation quality; word based SMT; Buildings; Computational modeling; Cybernetics; Decoding; Machine learning; Probability; Training; Parallel corpus; Statistical machine translation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics (ICMLC), 2010 International Conference on
Conference_Location :
Qingdao
Print_ISBN :
978-1-4244-6526-2
Type :
conf
DOI :
10.1109/ICMLC.2010.5580888
Filename :
5580888
Link To Document :
بازگشت