• DocumentCode
    284667
  • Title

    Cooccurrence smoothing for stochastic language modeling

  • Author

    Essen, Ute ; Steinbiss, Volker

  • Author_Institution
    Philips GmbH Forschungslaboratorien, Aachen, Germany
  • Volume
    1
  • fYear
    1992
  • fDate
    23-26 Mar 1992
  • Firstpage
    161
  • Abstract
    Training corpora for stochastic language models are virtually always too small for maximum-likelihood estimation, so smoothing the models is of great importance. The authors derive the cooccurrence smoothing technique for stochastic language modeling and give experimental evidence for its validity. Using word-bigram language models, cooccurrence smoothing improved the test-set perplexity by 14% on a German 100000-word text corpus and by 10% on an English 1-million word corpus
  • Keywords
    grammars; speech analysis and processing; speech recognition; stochastic processes; English; German; cooccurrence smoothing; maximum-likelihood estimation; speech recognition; stochastic language modeling; test-set perplexity; word-bigram language models; Context modeling; Estimation theory; Maximum likelihood estimation; Natural languages; Parameter estimation; Random variables; Smoothing methods; Speech recognition; Stochastic processes; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 1992. ICASSP-92., 1992 IEEE International Conference on
  • Conference_Location
    San Francisco, CA
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-0532-9
  • Type

    conf

  • DOI
    10.1109/ICASSP.1992.225947
  • Filename
    225947