DocumentCode
37793
Title
Improving Statistical Machine Translation Using Bayesian Word Alignment and Gibbs Sampling
Author
Mermer, C. ; Saraclar, Murat ; Sarikaya, R.
Author_Institution
TUBITAK BILGEM, Kocaeli, Turkey
Volume
21
Issue
5
fYear
2013
fDate
May-13
Firstpage
1090
Lastpage
1101
Abstract
We present a Bayesian approach to word alignment inference in IBM Models 1 and 2. In the original approach, word translation probabilities (i.e., model parameters) are estimated using the expectation-maximization (EM) algorithm. In the proposed approach, they are random variables with a prior and are integrated out during inference. We use Gibbs sampling to infer the word alignment posteriors. The inferred word alignments are compared against EM and variational Bayes (VB) inference in terms of their end-to-end translation performance on several language pairs and types of corpora up to 15 million sentence pairs. We show that Bayesian inference outperforms both EM and VB in the majority of test cases. Further analysis reveals that the proposed method effectively addresses the high-fertility rare word problem in EM and unaligned rare word problem in VB, achieves higher agreement and vocabulary coverage rates than both, and leads to smaller phrase tables.
Keywords
belief networks; expectation-maximisation algorithm; inference mechanisms; language translation; natural language processing; probability; random processes; sampling methods; word processing; Bayesian word alignment posterior inference; EM; Gibbs sampling; IBM models; VB; corpora types; expectation-maximization algorithm; high-fertility rare word problem; language pairs; phrase tables; random variables; sentence pairs; statistical machine translation performance improvement; unaligned rare word problem; variational Bayes inference; vocabulary coverage rates; Bayesian methods; Computational modeling; Hidden Markov models; Inference algorithms; Random variables; Speech; Speech processing; Bayesian methods; Gibbs sampling; statistical machine translation (SMT); word alignment;
fLanguage
English
Journal_Title
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher
ieee
ISSN
1558-7916
Type
jour
DOI
10.1109/TASL.2013.2244087
Filename
6425427
Link To Document