مرکز منطقه ای اطلاع رساني علوم و فناوري - Statistical language modeling using a variable context length

DocumentCode :

312029

Title :

Statistical language modeling using a variable context length

Author :

Kneser, Reinhard

Author_Institution :

Philips GmbH Forschungslab., Aachen, Germany

Volume :

fYear :

1996

fDate :

3-6 Oct 1996

Firstpage :

494

Abstract :

In this paper we investigate statistical language models with a variable context length. For such models the number of relevant words in a context is not fixed as in conventional M-gram models but depends on the context itself. We develop a measure for the quality of variable-length models and present a pruning algorithm for the creation of such models, based on this measure. Further we address the question how the use of a special backing-off distribution can improve the language models. Experiments were performed on two data bases, the ARPANAB corpus and the German Verbmobil corpus, respectively. The results show that variable-length models outperform conventional models of the same size. Furthermore it can be seen that if a moderate loss in performance is acceptable, the size of a language model can be reduced drastically by using the presented pruning algorithm

Keywords :

probability; speech recognition; statistical analysis; ARPANAB corpus; German Verbmobil corpus; backing-off distribution; conventional M-gram models; pruning algorithm; relevant words; statistical language modeling; variable context length; variable-length models; Context modeling; Cutoff frequency; Educational technology; Frequency estimation; History; Natural languages; Performance loss; Probability; Speech recognition; Training data;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on

Conference_Location :

Philadelphia, PA

Print_ISBN :

0-7803-3555-4

Type :

conf

DOI :

10.1109/ICSLP.1996.607162

Filename :

607162

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=312029