DocumentCode :
2348621
Title :
Data selection for statistical machine translation
Author :
Liu, Peng ; Zhou, Yu ; Zong, Chengqing
Author_Institution :
Nat. Lab. of Pattern Recognition, Chinese Acad. of Sci., Beijing, China
fYear :
2010
fDate :
21-23 Aug. 2010
Firstpage :
1
Lastpage :
5
Abstract :
The bilingual language corpus has a great effect on the performance of a statistical machine translation system. More data will lead to better performance. However, more data also increase the computational load. In this paper, we propose methods to estimate the sentence weight and select more informative sentences from the training corpus and the development corpus based on the sentence weight. The translation system is built and tuned on the compact corpus. The experimental results show that we can obtain a competitive performance with much less data.
Keywords :
data handling; language translation; natural language processing; statistical analysis; bilingual language corpus; computational load; data selection; development corpus; sentence weight; statistical machine translation; training corpus; Data selection; corpus optimization; statistical machine translation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Natural Language Processing and Knowledge Engineering (NLP-KE), 2010 International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-6896-6
Type :
conf
DOI :
10.1109/NLPKE.2010.5587827
Filename :
5587827
Link To Document :
بازگشت