DocumentCode
1787084
Title
Phrase table pruning by modeling the content of phrases
Author
Azadi, Fatemeh ; Khadivi, Shahram
Author_Institution
Dept. of Comput. Eng. & Inf. Technol., Amirkabir Univ. of Technol. (Tehran Polytech.) Tehran, Tehran, Iran
fYear
2014
fDate
9-11 Sept. 2014
Firstpage
535
Lastpage
538
Abstract
Many of the phrase pairs extracted in the phrase-based machine translation systems have low quality and are not relevant. So their existence in the phrase table not only enlarges it, but also could reduce the translation quality. There are many methods presented to prune these noisy phrase pairs, using the statistics derived from the phrase table. In this paper we proposed a new pruning method that unlike the other similar pruning approaches uses the content of each side of the phrase pair to estimate its relevance and quality. In order to model the content of phrases, the topic models have been used. With testing this new pruning method on a Farsi-English system we could prune more than 50% of the phrase-table without significant loss or even improvements in the BLEU scores.
Keywords
language translation; statistical analysis; BLEU scores; Farsi-English system; noisy phrase pairs; phrase pair extraction; phrase table; phrase table pruning method; phrase-based machine translation systems; topic models; Computational linguistics; Computational modeling; Equations; Mathematical model; Noise measurement; Probability; Training; Farsi — English; Phrase Based Statistical Machine Translation; Topic Modeling; phrase-table pruning;
fLanguage
English
Publisher
ieee
Conference_Titel
Telecommunications (IST), 2014 7th International Symposium on
Conference_Location
Tehran
Print_ISBN
978-1-4799-5358-5
Type
conf
DOI
10.1109/ISTEL.2014.7000762
Filename
7000762
Link To Document