• DocumentCode
    2382564
  • Title

    Semantic clustering for adaptive language modeling

  • Author

    Kneser, Reinhard ; Peters, Jochen

  • Author_Institution
    Philips GmbH Forschungslab., Aachen, Germany
  • Volume
    2
  • fYear
    1997
  • fDate
    21-24 Apr 1997
  • Firstpage
    779
  • Abstract
    In this paper we present efficient clustering algorithms for two novel class-based approaches to adaptive language modeling. In contrast to bigram and trigram class models, the proposed classes are related to the distribution and co-occurrence of words within complete text units and are thus mostly of a semantic nature. We introduce adaptation techniques such as the adaptive linear interpolation and an approximation to the minimum discriminant estimation and show how to use the automatically derived semantic structure in order to allow a fast adaptation to some special topic or style. In experiments performed on the Wall-Street-Journal corpus, intuitively convincing semantic classes were obtained. The resulting adaptive language models were significantly better than a standard cache model. Compared to a static model a reduction in perplexity of up to 31% could be achieved
  • Keywords
    adaptive estimation; interpolation; maximum likelihood estimation; natural languages; Wall-Street-Journal corpus; adaptation techniques; adaptive language modeling; adaptive linear interpolation; class-based approaches; complete text units; efficient clustering algorithms; intuitively convincing semantic classes; minimum discriminant estimation; perplexity reduction; semantic clustering; word co-occurrence; word distribution; Algorithm design and analysis; Clustering algorithms; Educational technology; History; Interpolation; Natural languages; Stochastic processes; Text recognition; Training data; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on
  • Conference_Location
    Munich
  • ISSN
    1520-6149
  • Print_ISBN
    0-8186-7919-0
  • Type

    conf

  • DOI
    10.1109/ICASSP.1997.596041
  • Filename
    596041