Clustering words for statistical language models based on contextual word similarity

Author

Farhat, Azarshid ; Isabelle, Jean-François ; O´Shaughnessy, Douglas

Author_Institution

INRS Telecommun., Ile des Soeurs, Que., Canada

Volume

1

fYear

1996

fDate

7-10 May 1996

Firstpage

180

Abstract

This paper describes a new word clustering approach for statistical language modeling. The classification criteria used by our approach is the contextual word similarity used in a simplified clustering algorithm. This clustering technique was tested on the INRS speech recognizer using the spontaneous English corpora, ATIS. Automatic word classification increases the word accuracy rate by 8.6% with a perplexity reduction about of 6.9%

Keywords

natural languages; pattern classification; speech recognition; statistical analysis; ATIS; INRS speech recognizer; automatic word classification; classification criteria; clustering algorithm; contextual word similarity; perplexity reduction; spontaneous English corpora; statistical language models; word clustering approach; Automatic speech recognition; Business; Clustering algorithms; Context modeling; Natural languages; Smoothing methods; Speech recognition; Stochastic processes; Testing; US Department of Transportation;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on

Conference_Location

Atlanta, GA

ISSN

1520-6149

Print_ISBN

0-7803-3192-3

Type

conf

DOI

10.1109/ICASSP.1996.540320

Filename

540320