DocumentCode :
2660230
Title :
Efficient data selection for machine translation
Author :
Mandal, A. ; Vergyri, D. ; Wang, W. ; Zheng, J. ; Stolcke, A. ; Tur, G. ; Hakkani-Tür, D. ; Ayan, N.F.
Author_Institution :
Speech Technol. & Res. Lab., SRI Int., Menlo Park, CA
fYear :
2008
fDate :
15-19 Dec. 2008
Firstpage :
261
Lastpage :
264
Abstract :
Performance of statistical machine translation (SMT) systems relies on the availability of a large parallel corpus which is used to estimate translation probabilities. However, the generation of such corpus is a long and expensive process. In this paper, we introduce two methods for efficient selection of training data to be translated by humans. Our methods are motivated by active learning and aim to choose new data that adds maximal information to the currently available data pool. The first method uses a measure of disagreement between multiple SMT systems, whereas the second uses a perplexity criterion. We performed experiments on Chinese-English data in multiple domains and test sets. Our results show that we can select only one-fifth of the additional training data and achieve similar or better translation performance, compared to that of using all available data.
Keywords :
language translation; learning (artificial intelligence); natural language processing; probability; statistical analysis; Chinese-English data; active learning; data pool; data selection; parallel corpus; perplexity criterion; statistical machine translation systems; training data; translation performance; translation probability; Availability; Humans; Information retrieval; Natural languages; Probability; Speech; Surface-mount technology; System testing; Training data; Web pages; data selection; machine translation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Spoken Language Technology Workshop, 2008. SLT 2008. IEEE
Conference_Location :
Goa
Print_ISBN :
978-1-4244-3471-8
Electronic_ISBN :
978-1-4244-3472-5
Type :
conf
DOI :
10.1109/SLT.2008.4777890
Filename :
4777890
Link To Document :
بازگشت