• DocumentCode
    1988260
  • Title

    Improving language models by using distant information

  • Author

    Brun, A. ; Langlois, D. ; Smaili, K.

  • Author_Institution
    Univ. Nancy 2, Nancy
  • fYear
    2007
  • fDate
    12-15 Feb. 2007
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    This study examines how to take originally advantage from distant information in statistical language models. We show that it is possible to use n-gram models considering histories different from those used during training. These models are called crossing context models. Our study deals with classical and distant n-gram models. A mixture of four models is proposed and evaluated. A bigram linear mixture achieves an improvement of 14% in terms of perplexity. Moreover the trigram mixture outperforms the standard trigram by 5.6%. These improvements have been obtained without complexifying standard n-gram models. The resulting mixture language model has been integrated into a speech recognition system. Its evaluation achieves a slight improvement in terms of word error rate on the data used for the francophone evaluation campaign ESTER [1]. Finally, the impact of the proposed crossing context language models on performance is presented according to various speakers.
  • Keywords
    speech recognition; statistical analysis; bigram linear mixture; context language models; distant information; distant n-gram models; francophone evaluation; speech recognition system; trigram mixture; word error rate; Context modeling; Error analysis; History; Natural languages; Neodymium; Speech recognition; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal Processing and Its Applications, 2007. ISSPA 2007. 9th International Symposium on
  • Conference_Location
    Sharjah
  • Print_ISBN
    978-1-4244-0778-1
  • Electronic_ISBN
    978-1-4244-1779-8
  • Type

    conf

  • DOI
    10.1109/ISSPA.2007.4555480
  • Filename
    4555480