• DocumentCode
    24574
  • Title

    Disambiguating Discourse Connectives for Statistical Machine Translation

  • Author

    Meyer, Thomas ; Hajlaoui, Najeh ; Popescu-Belis, Andrei

  • Author_Institution
    Google, Inc., Zurich, Switzerland
  • Volume
    23
  • Issue
    7
  • fYear
    2015
  • fDate
    Jul-15
  • Firstpage
    1184
  • Lastpage
    1197
  • Abstract
    This paper shows that the automatic labeling of discourse connectives with the relations they signal, prior to machine translation (MT), can be used by phrase-based statistical MT systems to improve their translations. This improvement is demonstrated here when translating from English to four target languages-French, German, Italian and Arabic-using several test sets from recent MT evaluation campaigns. Using automatically labeled data for training, tuning and testing MT systems is beneficial on condition that labels are sufficiently accurate, typically above 70%. To reach such an accuracy, a large array of features for discourse connective labeling (morpho-syntactic, semantic and discursive) are extracted using state-of-the-art tools and exploited in factored MT models. The translation of connectives is improved significantly, between 0.7% and 10% as measured with the dedicated ACT metric. The improvements depend mainly on the level of ambiguity of the connectives in the test sets.
  • Keywords
    language translation; natural language processing; statistical analysis; Arabic language; French language; German language; Italian language; discourse connective disambiguation; phrase-based statistical MT system; statistical machine translation; Feature extraction; IEEE transactions; Labeling; Speech; Testing; Training; Tuning; Discourse connectives; machine translation (MT);
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    2329-9290
  • Type

    jour

  • DOI
    10.1109/TASLP.2015.2422576
  • Filename
    7084603