Title :
Phrase table pruning by modeling the content of phrases
Author :
Azadi, Fatemeh ; Khadivi, Shahram
Author_Institution :
Dept. of Comput. Eng. & Inf. Technol., Amirkabir Univ. of Technol. (Tehran Polytech.) Tehran, Tehran, Iran
Abstract :
Many of the phrase pairs extracted in the phrase-based machine translation systems have low quality and are not relevant. So their existence in the phrase table not only enlarges it, but also could reduce the translation quality. There are many methods presented to prune these noisy phrase pairs, using the statistics derived from the phrase table. In this paper we proposed a new pruning method that unlike the other similar pruning approaches uses the content of each side of the phrase pair to estimate its relevance and quality. In order to model the content of phrases, the topic models have been used. With testing this new pruning method on a Farsi-English system we could prune more than 50% of the phrase-table without significant loss or even improvements in the BLEU scores.
Keywords :
language translation; statistical analysis; BLEU scores; Farsi-English system; noisy phrase pairs; phrase pair extraction; phrase table; phrase table pruning method; phrase-based machine translation systems; topic models; Computational linguistics; Computational modeling; Equations; Mathematical model; Noise measurement; Probability; Training; Farsi — English; Phrase Based Statistical Machine Translation; Topic Modeling; phrase-table pruning;
Conference_Titel :
Telecommunications (IST), 2014 7th International Symposium on
Conference_Location :
Tehran
Print_ISBN :
978-1-4799-5358-5
DOI :
10.1109/ISTEL.2014.7000762