Title :
Domain adaptation for statistical machine translation in development corpus selection
Author :
Zheng, Zhongguang ; He, Zhongjun ; Meng, Yao ; Yu, Hao
Author_Institution :
Fujitsu R&D Center Co., Ltd., Taiwan
Abstract :
The performance of statistical machine translation (SMT) system is affected by model parameters (e.g. weights of feature functions), which are usually tuned on a development corpus. Most research done to date has focused on algorithms for tuning parameters. However, the selection of development corpus is lack of discussion. It is believed that the parameters trained on a proper corpus will improve translation performance. Instead of exploring new algorithms, this paper aims to select development corpus for tuning parameters according to the test set. We address this problem as domain adaptation and propose two methods based on information retrieval (IR) technique and text clustering (TC) technique, respectively. Experimental results show that both the methods yield more stable performance for tuning parameters than subjective selection of development corpus.
Keywords :
information retrieval; language translation; statistical analysis; IR; SMT; TC; corpus selection development; domain adaptation; feature functions; information retrieval; model parameters; statistical machine translation; text clustering; tuning parameters; Adaptation model; Clustering methods; Feature extraction; Information retrieval; NIST; Training; Tuning;
Conference_Titel :
Universal Communication Symposium (IUCS), 2010 4th International
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-7821-7
DOI :
10.1109/IUCS.2010.5666775