DocumentCode :
170325
Title :
Improved Chinese-Japanese phrase-based MT quality using an extended quasi-parallel corpus
Author :
Hao Wang ; Wei Yang ; Lepage, Y.
Author_Institution :
Sch. of Comput. Eng. & Technol., Shanghai Univ., Shanghai, China
fYear :
2014
fDate :
16-18 May 2014
Firstpage :
6
Lastpage :
10
Abstract :
State-of-the-art phrase-based machine translation (MT) systems usually demand large parallel corpora in the step of training. The quality and the quantity of the training data exert a direct influence on the performance of such translation systems. The lack of open-source bilingual corpora for a particular language pair results in lower translation scores reported for such a language pair. This is the case of Chinese-Japanese. In this paper, we propose to build an extension of an initial parallel corpus in the form of quasi-parallel sentences, instead of adding new parallel sentences. The extension of the initial corpus is obtained by using monolingual analogical associations. Our experiments show that the use of such quasi-parallel corpora improves the performance of Chinese-Japanese translation systems.
Keywords :
language translation; natural language processing; Chinese-Japanese phrase-based MT quality; Chinese-Japanese translation systems; extended quasiparallel corpus; monolingual analogical associations; open-source bilingual corpora; phrase-based machine translation systems; quasiparallel sentences; Computational linguistics; Educational institutions; Hidden Markov models; Mathematical model; Natural language processing; Training; Training data; analogy; machine translation; paraphrasing; quasi-parallel data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Progress in Informatics and Computing (PIC), 2014 International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-1-4799-2033-4
Type :
conf
DOI :
10.1109/PIC.2014.6972285
Filename :
6972285
Link To Document :
بازگشت