• DocumentCode
    3531045
  • Title

    Resampling auxiliary data for language model adaptation in machine translation for speech

  • Author

    Maskey, Sameer ; Sethy, Abhinav

  • Author_Institution
    IBM T.J. Watson Res. Center, New York, NY
  • fYear
    2009
  • fDate
    19-24 April 2009
  • Firstpage
    4817
  • Lastpage
    4820
  • Abstract
    Performance of n-gram language models depends to a large extent on the amount of training text material available for building the models and the degree to which this text matches the domain of interest. The language modeling community is showing a growing interest in using large collections of auxiliary textual material to supplement sparse in-domain resources. One of the problems in using such auxiliary corpora is that they may differ significantly from the specific nature of the domain of interest. In this paper, we propose three different methods for adapting language models for a speech to speech (S2S) translation system when auxiliary corpora are of different genre and domain. The proposed methods are based on centroid similarity, n-gram ratios and resampled language models. We show how these methods can be used to select out of domain textual data such as newswire text to improve a S2S system. We were able to achieve an overall relative improvement of 3.8% in BLEU score over a baseline system that uses only in-domain conversational data.
  • Keywords
    language translation; speech processing; auxiliary data resampling; language model adaptation; language modeling community; machine translation; n-gram language models; speech to speech translation system; Adaptation model; Entropy; Materials testing; Natural languages; Performance gain; Speech coding; Support vector machine classification; Support vector machines; System testing; Text categorization; Domain Adaptation; Language Model Adaptation; Machine Translation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on
  • Conference_Location
    Taipei
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4244-2353-8
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2009.4960709
  • Filename
    4960709