• DocumentCode
    3166369
  • Title

    New methods and evaluation experiments on translating TED talks in the IWSLT benchmark

  • Author

    Axelrod, Amittai ; He, Xiaodong ; Deng, Li ; Acero, Alex ; Hwang, Mei-Yuh

  • Author_Institution
    Microsoft Res., Redmond, WA, USA
  • fYear
    2012
  • fDate
    25-30 March 2012
  • Firstpage
    4945
  • Lastpage
    4948
  • Abstract
    The IWSLT benchmark task is an annual evaluation campaign on spoken language translation held by the International Workshop on Spoken Language Processing (IWSLT). The task is to translate TED talks (www.ted.com). This task presents two unique challenges: Firstly, the underlying topic switches sharply from talk to talk, and each one contains only tens to hundreds of utterances. The translation system therefore needs to adapt to the current topic quickly and dynamically. Secondly, unlike other machine translation benchmark tasks, only a very small relevant parallel corpus (transcripts of TED talks) is available. Therefore, it is necessary to perform accurate translation model estimation with limited data. In this paper, we present our recent progress and two new methods on the IWSLT TED talk translation task from Chinese into English. In particular, to address the first problem, we use unsupervised topic modeling to select additional topic-dependent parallel data from a globally irrelevant corpus. These additional data slices can then be used to build an unsupervised topic-adapted machine translation system. For the second problem, we develop a discriminative training method to estimate the translation models more accurately. Our experimental evaluation results show that both methods improve the translation quality over a state-of-the-art baseline.
  • Keywords
    language translation; natural language processing; speech processing; IWSLT TED talk translation task; IWSLT benchmark task; International Workshop on Spoken Language Processing; data slices; discriminative training method; machine translation benchmark tasks; spoken language translation system; topic-dependent parallel data; unsupervised topic modeling; unsupervised topic-adapted machine translation system; Adaptation models; Benchmark testing; Data models; Estimation; Helium; Hidden Markov models; Training; IWSLT; discriminative training; spoken language translation; topic adaptation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
  • Conference_Location
    Kyoto
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4673-0045-2
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2012.6289029
  • Filename
    6289029