DocumentCode :
310523
Title :
Distant bigram language modelling using maximum entropy
Author :
Simons, M. ; Ney, H. ; Martin, S.C.
Author_Institution :
Lehrstuhl fur Inf. VI, Tech. Hochschule Aachen, Germany
Volume :
2
fYear :
1997
fDate :
21-24 Apr 1997
Firstpage :
787
Abstract :
Applies the maximum entropy approach to so-called distant bigram language modelling. In addition to the usual unigram and bigram dependencies, we use distant bigram dependencies, where the immediate predecessor word of the word position under consideration is skipped. We analyze the computational complexity of the resulting training algorithm, i.e. the generalized iterative scaling (GIS) algorithm, and study the details of its implementation. We describe a method for handling unseen events in the maximum entropy approach; this is achieved by discounting the frequencies of observed events. We study the effect of this discounting operation on the convergence of the GIS algorithm. We give experimental perplexity results for a corpus from the Wall Street Journal (WSJ) task. By using the maximum entropy approach and the distant bigram dependencies, we are able to reduce the perplexity from 205.4 for our best conventional bigram model to 169.5
Keywords :
computational complexity; convergence; iterative methods; maximum entropy methods; natural languages; nomograms; Wall Street Journal task; algorithm implementation; computational complexity; convergence; corpus; distant bigram dependencies; distant bigram language modelling; generalized iterative scaling algorithm; immediate predecessor word skipping; maximum entropy approach; observed event frequency discounting operation; perplexity reduction; training algorithm; unigram dependencies; unseen event handling; Algorithm design and analysis; Computational complexity; Convergence; Entropy; Frequency; Geographic Information Systems; History; Iterative algorithms; Iterative methods; Natural language processing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on
Conference_Location :
Munich
ISSN :
1520-6149
Print_ISBN :
0-8186-7919-0
Type :
conf
DOI :
10.1109/ICASSP.1997.596045
Filename :
596045
Link To Document :
بازگشت