DocumentCode :
1619023
Title :
Domain adaptation for statistical machine translation in development corpus selection
Author :
Zheng, Zhongguang ; He, Zhongjun ; Meng, Yao ; Yu, Hao
Author_Institution :
Fujitsu R&D Center Co., Ltd., Taiwan
fYear :
2010
Firstpage :
2
Lastpage :
7
Abstract :
The performance of statistical machine translation (SMT) system is affected by model parameters (e.g. weights of feature functions), which are usually tuned on a development corpus. Most research done to date has focused on algorithms for tuning parameters. However, the selection of development corpus is lack of discussion. It is believed that the parameters trained on a proper corpus will improve translation performance. Instead of exploring new algorithms, this paper aims to select development corpus for tuning parameters according to the test set. We address this problem as domain adaptation and propose two methods based on information retrieval (IR) technique and text clustering (TC) technique, respectively. Experimental results show that both the methods yield more stable performance for tuning parameters than subjective selection of development corpus.
Keywords :
information retrieval; language translation; statistical analysis; IR; SMT; TC; corpus selection development; domain adaptation; feature functions; information retrieval; model parameters; statistical machine translation; text clustering; tuning parameters; Adaptation model; Clustering methods; Feature extraction; Information retrieval; NIST; Training; Tuning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Universal Communication Symposium (IUCS), 2010 4th International
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-7821-7
Type :
conf
DOI :
10.1109/IUCS.2010.5666775
Filename :
5666775
Link To Document :
بازگشت