DocumentCode :
302314
Title :
A class based language model for speech recognition
Author :
Ward, Wayne ; Issar, Sunil
Author_Institution :
Sch. of Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA, USA
Volume :
1
fYear :
1996
fDate :
7-10 May 1996
Firstpage :
416
Abstract :
Class based language models are often used when there is insufficient data to generate a word based language model directly from the training data. In this approach, similar items are clustered into classes, an n-gram language model for the class tokens is generated, and then the probabilities for words in a class are distributed according to the smoothed relative unigram frequencies of the words. Classes expand to lists of single word tokens, that is, a class cannot represent a sequence of lexical tokens. We propose a more general mechanism for defining a language model class. In it, classes are expanded to word sequences through finite-state networks. This allows expansion to word sequences without requiring compound words in the lexicon. Where finite-state models are too brittle to represent sentence-level strings, they can represent class-level strings (dates, names and titles for example). We compared the perplexity on the ARPA Dec93 ATIS Test set and found that the new model reduced the perplexity by approximately 17 percent (relative)
Keywords :
grammars; natural languages; probability; speech recognition; ARPA Dec93 ATIS Test set; class based language model; class-level strings; clustering method; finite-state networks; n-gram language model; perplexity; probabilities; sentence-level strings; single word tokens; smoothed relative unigram frequencies; speech recognition; training data; word sequences; Books; Data mining; Databases; Decoding; Natural languages; Robustness; Speech recognition; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on
Conference_Location :
Atlanta, GA
ISSN :
1520-6149
Print_ISBN :
0-7803-3192-3
Type :
conf
DOI :
10.1109/ICASSP.1996.541121
Filename :
541121
Link To Document :
بازگشت