DocumentCode :
2382564
Title :
Semantic clustering for adaptive language modeling
Author :
Kneser, Reinhard ; Peters, Jochen
Author_Institution :
Philips GmbH Forschungslab., Aachen, Germany
Volume :
2
fYear :
1997
fDate :
21-24 Apr 1997
Firstpage :
779
Abstract :
In this paper we present efficient clustering algorithms for two novel class-based approaches to adaptive language modeling. In contrast to bigram and trigram class models, the proposed classes are related to the distribution and co-occurrence of words within complete text units and are thus mostly of a semantic nature. We introduce adaptation techniques such as the adaptive linear interpolation and an approximation to the minimum discriminant estimation and show how to use the automatically derived semantic structure in order to allow a fast adaptation to some special topic or style. In experiments performed on the Wall-Street-Journal corpus, intuitively convincing semantic classes were obtained. The resulting adaptive language models were significantly better than a standard cache model. Compared to a static model a reduction in perplexity of up to 31% could be achieved
Keywords :
adaptive estimation; interpolation; maximum likelihood estimation; natural languages; Wall-Street-Journal corpus; adaptation techniques; adaptive language modeling; adaptive linear interpolation; class-based approaches; complete text units; efficient clustering algorithms; intuitively convincing semantic classes; minimum discriminant estimation; perplexity reduction; semantic clustering; word co-occurrence; word distribution; Algorithm design and analysis; Clustering algorithms; Educational technology; History; Interpolation; Natural languages; Stochastic processes; Text recognition; Training data; Vocabulary;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on
Conference_Location :
Munich
ISSN :
1520-6149
Print_ISBN :
0-8186-7919-0
Type :
conf
DOI :
10.1109/ICASSP.1997.596041
Filename :
596041
Link To Document :
بازگشت