• DocumentCode
    302103
  • Title

    Clustering words for statistical language models based on contextual word similarity

  • Author

    Farhat, Azarshid ; Isabelle, Jean-François ; O´Shaughnessy, Douglas

  • Author_Institution
    INRS Telecommun., Ile des Soeurs, Que., Canada
  • Volume
    1
  • fYear
    1996
  • fDate
    7-10 May 1996
  • Firstpage
    180
  • Abstract
    This paper describes a new word clustering approach for statistical language modeling. The classification criteria used by our approach is the contextual word similarity used in a simplified clustering algorithm. This clustering technique was tested on the INRS speech recognizer using the spontaneous English corpora, ATIS. Automatic word classification increases the word accuracy rate by 8.6% with a perplexity reduction about of 6.9%
  • Keywords
    natural languages; pattern classification; speech recognition; statistical analysis; ATIS; INRS speech recognizer; automatic word classification; classification criteria; clustering algorithm; contextual word similarity; perplexity reduction; spontaneous English corpora; statistical language models; word clustering approach; Automatic speech recognition; Business; Clustering algorithms; Context modeling; Natural languages; Smoothing methods; Speech recognition; Stochastic processes; Testing; US Department of Transportation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on
  • Conference_Location
    Atlanta, GA
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-3192-3
  • Type

    conf

  • DOI
    10.1109/ICASSP.1996.540320
  • Filename
    540320