مرکز منطقه ای اطلاع رساني علوم و فناوري - Data selection for statistical machine translation

DocumentCode :

2348621

Title :

Data selection for statistical machine translation

Author :

Liu, Peng ; Zhou, Yu ; Zong, Chengqing

Author_Institution :

Nat. Lab. of Pattern Recognition, Chinese Acad. of Sci., Beijing, China

fYear :

2010

fDate :

21-23 Aug. 2010

Firstpage :

Lastpage :

Abstract :

The bilingual language corpus has a great effect on the performance of a statistical machine translation system. More data will lead to better performance. However, more data also increase the computational load. In this paper, we propose methods to estimate the sentence weight and select more informative sentences from the training corpus and the development corpus based on the sentence weight. The translation system is built and tuned on the compact corpus. The experimental results show that we can obtain a competitive performance with much less data.

Keywords :

data handling; language translation; natural language processing; statistical analysis; bilingual language corpus; computational load; data selection; development corpus; sentence weight; statistical machine translation; training corpus; Data selection; corpus optimization; statistical machine translation;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Natural Language Processing and Knowledge Engineering (NLP-KE), 2010 International Conference on

Conference_Location :

Beijing

Print_ISBN :

978-1-4244-6896-6

Type :

conf

DOI :

10.1109/NLPKE.2010.5587827

Filename :

5587827

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2348621