• DocumentCode
    312029
  • Title

    Statistical language modeling using a variable context length

  • Author

    Kneser, Reinhard

  • Author_Institution
    Philips GmbH Forschungslab., Aachen, Germany
  • Volume
    1
  • fYear
    1996
  • fDate
    3-6 Oct 1996
  • Firstpage
    494
  • Abstract
    In this paper we investigate statistical language models with a variable context length. For such models the number of relevant words in a context is not fixed as in conventional M-gram models but depends on the context itself. We develop a measure for the quality of variable-length models and present a pruning algorithm for the creation of such models, based on this measure. Further we address the question how the use of a special backing-off distribution can improve the language models. Experiments were performed on two data bases, the ARPANAB corpus and the German Verbmobil corpus, respectively. The results show that variable-length models outperform conventional models of the same size. Furthermore it can be seen that if a moderate loss in performance is acceptable, the size of a language model can be reduced drastically by using the presented pruning algorithm
  • Keywords
    probability; speech recognition; statistical analysis; ARPANAB corpus; German Verbmobil corpus; backing-off distribution; conventional M-gram models; pruning algorithm; relevant words; statistical language modeling; variable context length; variable-length models; Context modeling; Cutoff frequency; Educational technology; Frequency estimation; History; Natural languages; Performance loss; Probability; Speech recognition; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on
  • Conference_Location
    Philadelphia, PA
  • Print_ISBN
    0-7803-3555-4
  • Type

    conf

  • DOI
    10.1109/ICSLP.1996.607162
  • Filename
    607162