DocumentCode
312029
Title
Statistical language modeling using a variable context length
Author
Kneser, Reinhard
Author_Institution
Philips GmbH Forschungslab., Aachen, Germany
Volume
1
fYear
1996
fDate
3-6 Oct 1996
Firstpage
494
Abstract
In this paper we investigate statistical language models with a variable context length. For such models the number of relevant words in a context is not fixed as in conventional M-gram models but depends on the context itself. We develop a measure for the quality of variable-length models and present a pruning algorithm for the creation of such models, based on this measure. Further we address the question how the use of a special backing-off distribution can improve the language models. Experiments were performed on two data bases, the ARPANAB corpus and the German Verbmobil corpus, respectively. The results show that variable-length models outperform conventional models of the same size. Furthermore it can be seen that if a moderate loss in performance is acceptable, the size of a language model can be reduced drastically by using the presented pruning algorithm
Keywords
probability; speech recognition; statistical analysis; ARPANAB corpus; German Verbmobil corpus; backing-off distribution; conventional M-gram models; pruning algorithm; relevant words; statistical language modeling; variable context length; variable-length models; Context modeling; Cutoff frequency; Educational technology; Frequency estimation; History; Natural languages; Performance loss; Probability; Speech recognition; Training data;
fLanguage
English
Publisher
ieee
Conference_Titel
Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on
Conference_Location
Philadelphia, PA
Print_ISBN
0-7803-3555-4
Type
conf
DOI
10.1109/ICSLP.1996.607162
Filename
607162
Link To Document