Title :
Clustering words for statistical language models based on contextual word similarity
Author :
Farhat, Azarshid ; Isabelle, Jean-François ; O´Shaughnessy, Douglas
Author_Institution :
INRS Telecommun., Ile des Soeurs, Que., Canada
Abstract :
This paper describes a new word clustering approach for statistical language modeling. The classification criteria used by our approach is the contextual word similarity used in a simplified clustering algorithm. This clustering technique was tested on the INRS speech recognizer using the spontaneous English corpora, ATIS. Automatic word classification increases the word accuracy rate by 8.6% with a perplexity reduction about of 6.9%
Keywords :
natural languages; pattern classification; speech recognition; statistical analysis; ATIS; INRS speech recognizer; automatic word classification; classification criteria; clustering algorithm; contextual word similarity; perplexity reduction; spontaneous English corpora; statistical language models; word clustering approach; Automatic speech recognition; Business; Clustering algorithms; Context modeling; Natural languages; Smoothing methods; Speech recognition; Stochastic processes; Testing; US Department of Transportation;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on
Conference_Location :
Atlanta, GA
Print_ISBN :
0-7803-3192-3
DOI :
10.1109/ICASSP.1996.540320