DocumentCode :
2253990
Title :
Evaluation of a language model using a clustered model backoff
Author :
Miller, John W. ; Alleva, Fil
Author_Institution :
Microsoft Corp., Redmond, WA, USA
Volume :
1
fYear :
1996
fDate :
3-6 Oct 1996
Firstpage :
390
Abstract :
Describes and evaluates a language model using word classes that have been automatically generated from a word clustering algorithm. Class-based language models have been shown to be effective for rapid adaptation, training on small datasets, and reduced memory usage. In terms of model perplexity, prior work has shown diminished returns for class-based language models constructed using very large training sets. This paper describes a method of using a class model as a backoff to a bigram model which produced significant benefits even when trained from a large text corpus. Tests results on the Whisper continuous speech recognition system show that, for a given word error rate, the clustered bigram model uses 2/3 fewer parameters compared to a standard bigram model using unigram backoff
Keywords :
linguistics; nomograms; pattern classification; speech recognition; Whisper continuous speech recognition system; automatically generated word classes; class-based language model; clustered bigram model; clustered model backoff; large text corpus; large training sets; memory usage; model perplexity; rapid adaptation; unigram backoff; word clustering algorithm; word error rate; Clustering algorithms; Error analysis; Frequency estimation; Memory management; Speech recognition; System testing; Vocabulary;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on
Conference_Location :
Philadelphia, PA
Print_ISBN :
0-7803-3555-4
Type :
conf
DOI :
10.1109/ICSLP.1996.607136
Filename :
607136
Link To Document :
بازگشت