• DocumentCode
    730849
  • Title

    Token-level interpolation for class-based language models

  • Author

    Levit, Michael ; Stolcke, Andreas ; Shuangyu Chang ; Parthasarathy, Sarangarajan

  • Author_Institution
    Microsoft Corp., Redmond, WA, USA
  • fYear
    2015
  • fDate
    19-24 April 2015
  • Firstpage
    5426
  • Lastpage
    5430
  • Abstract
    We describe a method for interpolation of class-based n-gram language models. Our algorithm is an extension of the traditional EM-based approach that optimizes perplexity of the training set with respect to a collection of n-gram language models linearly combined in the probability space. However, unlike prior work, it naturally supports context-dependent interpolation for class-based LMs. In addition, the method works naturally with the recently introduced wordphrase- entity (WPE) language models that unify words, phrases and entities into a single statistical framework. Applied to the Calendar scenario of the Personal Assistant domain, our method achieved significant perplexity reduction and improved word error rates.
  • Keywords
    expectation-maximisation algorithm; interpolation; natural language processing; probability; speech processing; statistical analysis; EM-based approach; WPE language models; calendar scenario; class-based LM; class-based n-gram language models; context-dependent interpolation; expectation-maximisation approach; perplexity reduction; personal assistant domain; probability space; token-level interpolation; word error rates; wordphrase-entity language models; Adaptation models; Computational modeling; Context; Context modeling; Interpolation; Probability; Training; class-based language models; context-dependent interpolation; language model interpolation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
  • Conference_Location
    South Brisbane, QLD
  • Type

    conf

  • DOI
    10.1109/ICASSP.2015.7179008
  • Filename
    7179008