• DocumentCode
    1787084
  • Title

    Phrase table pruning by modeling the content of phrases

  • Author

    Azadi, Fatemeh ; Khadivi, Shahram

  • Author_Institution
    Dept. of Comput. Eng. & Inf. Technol., Amirkabir Univ. of Technol. (Tehran Polytech.) Tehran, Tehran, Iran
  • fYear
    2014
  • fDate
    9-11 Sept. 2014
  • Firstpage
    535
  • Lastpage
    538
  • Abstract
    Many of the phrase pairs extracted in the phrase-based machine translation systems have low quality and are not relevant. So their existence in the phrase table not only enlarges it, but also could reduce the translation quality. There are many methods presented to prune these noisy phrase pairs, using the statistics derived from the phrase table. In this paper we proposed a new pruning method that unlike the other similar pruning approaches uses the content of each side of the phrase pair to estimate its relevance and quality. In order to model the content of phrases, the topic models have been used. With testing this new pruning method on a Farsi-English system we could prune more than 50% of the phrase-table without significant loss or even improvements in the BLEU scores.
  • Keywords
    language translation; statistical analysis; BLEU scores; Farsi-English system; noisy phrase pairs; phrase pair extraction; phrase table; phrase table pruning method; phrase-based machine translation systems; topic models; Computational linguistics; Computational modeling; Equations; Mathematical model; Noise measurement; Probability; Training; Farsi — English; Phrase Based Statistical Machine Translation; Topic Modeling; phrase-table pruning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Telecommunications (IST), 2014 7th International Symposium on
  • Conference_Location
    Tehran
  • Print_ISBN
    978-1-4799-5358-5
  • Type

    conf

  • DOI
    10.1109/ISTEL.2014.7000762
  • Filename
    7000762