• DocumentCode
    323515
  • Title

    Building class-based language models with contextual statistics

  • Author

    Bai, Shvanghu ; Li, Haizhou ; Lin, Zhiwei ; Yuan, Baosheng

  • Author_Institution
    Inst. of Syst. Sci., Nat. Univ. of Singapore, Singapore
  • Volume
    1
  • fYear
    1998
  • fDate
    12-15 May 1998
  • Firstpage
    173
  • Abstract
    Novel clustering algorithms are proposed by using the contextual statistics of words for class-based language models. The minimum discriminative information (MDI) is used as a distance measure. Three algorithms are implemented to build bigram language models for a vocabulary of 50000 words over a corpus of over 200 million words. The computational cost of the algorithms and the resulting LM perplexity are studied. The comparisons between the MDI algorithm and the maximum mutual information algorithm are also given to demonstrate the effectiveness and the efficiency of the new algorithms. It is shown that the MDI approaches make the tree-building clustering possible with large vocabulary
  • Keywords
    context-sensitive grammars; information theory; natural languages; pattern recognition; speech processing; speech recognition; statistical analysis; MDI algorithm; bigram language models; class-based language models; clustering algorithms; computational cost; contextual statistics; distance measure; efficiency; large vocabulary continuous speech recognition; maximum mutual information algorithm; minimum discriminative information; tree-building clustering; words; Clustering algorithms; Computational efficiency; Context modeling; Distortion measurement; Mutual information; Natural languages; Partitioning algorithms; Speech recognition; Statistics; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on
  • Conference_Location
    Seattle, WA
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-4428-6
  • Type

    conf

  • DOI
    10.1109/ICASSP.1998.674395
  • Filename
    674395