DocumentCode :
835963
Title :
An Online Relevant Set Algorithm for Statistical Machine Translation
Author :
Tillmann, Christoph ; Zhang, Tong
Author_Institution :
IBM T. J. Watson Res. Center, Yorktown Heights, NY
Volume :
16
Issue :
7
fYear :
2008
Firstpage :
1274
Lastpage :
1286
Abstract :
This paper presents a novel online relevant set algorithm for a linearly scored block sequence translation model. The key component is a new procedure to directly optimize the global scoring function used by a statistical machine translation (SMT) decoder. This training procedure treats the decoder as a black-box, and thus can be used to optimize any decoding scheme. The novel algorithm is evaluated using different feature types: 1) commonly used probabilistic features, such as translation, language, or distortion model probabilities, and 2) binary features. In particular, encouraging results on a standard Arabic-English translation task are presented for a translation system that uses only binary feature functions. To further demonstrate the effectiveness of the novel training algorithm, a detailed comparison with the widely used minimum-error-rate (MER) training algorithm is presented using the same decoder and feature set. The online algorithm is simplified by introducing so-called "seed" block sequences which enable the training to be carried out without a gold standard block translation. While the online training algorithm is extremely fast, it also improves translation scores over the MER algorithm in some experiments.
Keywords :
language translation; natural languages; probability; statistical analysis; Arabic-English translation; binary features; global scoring function; linearly scored block sequence translation model; minimum-error-rate training algorithm; online relevant set algorithm; probabilistic features; statistical machine translation decoder; Australia; Concatenated codes; Decoding; Gold; Machine learning; Natural languages; Probability; Statistics; Surface-mount technology; Tagging; Discriminative learning; online algorithm; statistical machine translation;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2008.921760
Filename :
4599396
Link To Document :
بازگشت