Title :
Improved backing-off for M-gram language modeling
Author :
Kneser, Reinhard ; Ney, Hermann
Author_Institution :
Philips GmbH Forschungslab., Aachen, Germany
Abstract :
In stochastic language modeling, backing-off is a widely used method to cope with the sparse data problem. In case of unseen events this method backs off to a less specific distribution. In this paper we propose to use distributions which are especially optimized for the task of backing-off. Two different theoretical derivations lead to distributions which are quite different from the probability distributions that are usually used for backing-off. Experiments show an improvement of about 10% in terms of perplexity and 5% in terms of word error rate
Keywords :
grammars; natural languages; probability; speech processing; speech recognition; statistical analysis; stochastic processes; backing-off; distributions; experiments; perplexity; sparse data problem; stochastic language modeling; word error rate; Error analysis; History; Interpolation; Laboratories; Natural languages; Probability distribution; Smoothing methods; Stochastic processes; Training data;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 1995. ICASSP-95., 1995 International Conference on
Conference_Location :
Detroit, MI
Print_ISBN :
0-7803-2431-5
DOI :
10.1109/ICASSP.1995.479394