DocumentCode
2382564
Title
Semantic clustering for adaptive language modeling
Author
Kneser, Reinhard ; Peters, Jochen
Author_Institution
Philips GmbH Forschungslab., Aachen, Germany
Volume
2
fYear
1997
fDate
21-24 Apr 1997
Firstpage
779
Abstract
In this paper we present efficient clustering algorithms for two novel class-based approaches to adaptive language modeling. In contrast to bigram and trigram class models, the proposed classes are related to the distribution and co-occurrence of words within complete text units and are thus mostly of a semantic nature. We introduce adaptation techniques such as the adaptive linear interpolation and an approximation to the minimum discriminant estimation and show how to use the automatically derived semantic structure in order to allow a fast adaptation to some special topic or style. In experiments performed on the Wall-Street-Journal corpus, intuitively convincing semantic classes were obtained. The resulting adaptive language models were significantly better than a standard cache model. Compared to a static model a reduction in perplexity of up to 31% could be achieved
Keywords
adaptive estimation; interpolation; maximum likelihood estimation; natural languages; Wall-Street-Journal corpus; adaptation techniques; adaptive language modeling; adaptive linear interpolation; class-based approaches; complete text units; efficient clustering algorithms; intuitively convincing semantic classes; minimum discriminant estimation; perplexity reduction; semantic clustering; word co-occurrence; word distribution; Algorithm design and analysis; Clustering algorithms; Educational technology; History; Interpolation; Natural languages; Stochastic processes; Text recognition; Training data; Vocabulary;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on
Conference_Location
Munich
ISSN
1520-6149
Print_ISBN
0-8186-7919-0
Type
conf
DOI
10.1109/ICASSP.1997.596041
Filename
596041
Link To Document