DocumentCode
11499
Title
Exploring Diverse Features for Statistical Machine Translation Model Pruning
Author
Mei Tu ; Yu Zhou ; Chengqing Zong
Author_Institution
Nat. Lab. of Pattern Recognition, Inst. of Autom., Beijing, China
Volume
23
Issue
11
fYear
2015
fDate
Nov. 2015
Firstpage
1847
Lastpage
1857
Abstract
In phrase-based and hierarchical phrase-based statistical machine translation systems, translation performance depends heavily on the size and quality of the translation table. To meet the requirements of making a real-time response, some research has been performed to filter the translation table. However, most existing methods are always based on one or two constraints that act as hard rules, such as not allowing phrase-pairs with low translation probabilities. These approaches sometimes make constraints rigid because they consider only a single factor instead of composite factors. Based on the considerations above, in this paper, we propose a machine learning-based framework that integrates multiple features for translation model pruning. Experimental results show that our framework is effective by pruning 80% of the phrase-pairs and 70% of the hierarchical rules, while retaining the quality of the translation models when using the BLEU evaluation metric. Our study further shows that our method can select the most useful phrase-pairs and rules, including those that are low in frequency but still very useful.
Keywords
filtering theory; language translation; learning (artificial intelligence); BLEU evaluation metric; composite factors; diverse features; hard rules; hierarchical phrase; machine learning; phrase-pairs; real-time response; statistical machine translation model pruning; translation table filter; Bidirectional control; Data models; Decoding; IEEE transactions; Syntactics; Training; Training data; Classification; statistical machine translation (SMT); syntactic constraints; translation model pruning;
fLanguage
English
Journal_Title
Audio, Speech, and Language Processing, IEEE/ACM Transactions on
Publisher
ieee
ISSN
2329-9290
Type
jour
DOI
10.1109/TASLP.2015.2456413
Filename
7156075
Link To Document