Title :
Statistical language modeling using a variable context length
Author :
Kneser, Reinhard
Author_Institution :
Philips GmbH Forschungslab., Aachen, Germany
Abstract :
In this paper we investigate statistical language models with a variable context length. For such models the number of relevant words in a context is not fixed as in conventional M-gram models but depends on the context itself. We develop a measure for the quality of variable-length models and present a pruning algorithm for the creation of such models, based on this measure. Further we address the question how the use of a special backing-off distribution can improve the language models. Experiments were performed on two data bases, the ARPANAB corpus and the German Verbmobil corpus, respectively. The results show that variable-length models outperform conventional models of the same size. Furthermore it can be seen that if a moderate loss in performance is acceptable, the size of a language model can be reduced drastically by using the presented pruning algorithm
Keywords :
probability; speech recognition; statistical analysis; ARPANAB corpus; German Verbmobil corpus; backing-off distribution; conventional M-gram models; pruning algorithm; relevant words; statistical language modeling; variable context length; variable-length models; Context modeling; Cutoff frequency; Educational technology; Frequency estimation; History; Natural languages; Performance loss; Probability; Speech recognition; Training data;
Conference_Titel :
Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on
Conference_Location :
Philadelphia, PA
Print_ISBN :
0-7803-3555-4
DOI :
10.1109/ICSLP.1996.607162