DocumentCode :
302100
Title :
Multilingual stochastic n-gram class language models
Author :
Jardino, Michèle
Author_Institution :
Lab. d´´Inf. pour la Mecanique et les Sci. de l´´Ingenieur, CNRS, Orsay, France
Volume :
1
fYear :
1996
fDate :
7-10 May 1996
Firstpage :
161
Abstract :
Stochastic language models are widely used in continuous speech recognition systems where a priori probabilities of word sequences are needed. These probabilities are usually given by n-gram word models, estimated on very large training texts. When n increases, it becomes harder to find reliable statistics, even with huge texts. Grouping words is a way to overcome this problem. We have developed an automatic language independent classification procedure, which is able to optimize the classification of tens of millions of untagged words in less than a few hours on a Unix workstation. With this language independent approach, three corpora each containing about 30 million words of newspaper texts, in French, German and English, have been mapped into different numbers of classes. From these classifications, bi-gram and tri-gram class language models have been built. The perplexities of held-out test texts have been assessed, showing that tri-gram class models give lower values than those obtained with tri-gram word models, for the three languages
Keywords :
grammars; natural languages; probability; speech recognition; English; French; German; Unix workstation; automatic language independent classification; bi-gram class language models; continuous speech recognition systems; corpora; multilingual stochastic n-gram class; n-gram word models; newspaper texts; perplexities; stochastic language models; training texts; tri-gram class language models; untagged words classification; word probabilities; word sequences; Iterative methods; Natural languages; Probability; Speech recognition; Statistics; Stochastic processes; Stochastic systems; Testing; Vocabulary; Workstations;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on
Conference_Location :
Atlanta, GA
ISSN :
1520-6149
Print_ISBN :
0-7803-3192-3
Type :
conf
DOI :
10.1109/ICASSP.1996.540315
Filename :
540315
Link To Document :
بازگشت