DocumentCode :
1290631
Title :
A survey of smoothing techniques for ME models
Author :
Chen, Stanley F. ; Rosenfeld, Ronald
Author_Institution :
IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
Volume :
8
Issue :
1
fYear :
2000
fDate :
1/1/2000 12:00:00 AM
Firstpage :
37
Lastpage :
50
Abstract :
In certain contexts, maximum entropy (ME) modeling can be viewed as maximum likelihood (ML) training for exponential models, and like other ML methods is prone to overfitting of training data. Several smoothing methods for ME models have been proposed to address this problem, but previous results do not make it clear how these smoothing methods compare with smoothing methods for other types of related models. In this work, we survey previous work in ME smoothing and compare the performance of several of these algorithms with conventional techniques for smoothing n-gram language models. Because of the mature body of research in n-gram model smoothing and the close connection between ME and conventional n-gram models, this domain is well-suited to gauge the performance of ME smoothing methods. Over a large number of data sets, we find that fuzzy ME smoothing performs as well as or better than all other algorithms under consideration. We contrast this method with previous n-gram smoothing methods to explain its superior performance
Keywords :
computational linguistics; maximum entropy methods; maximum likelihood estimation; natural languages; probability; ME models; data sets; exponential models; fuzzy smoothing; maximum entropy modeling; maximum likelihood training; n-gram language models; performance evaluation; smoothing techniques; training data overfitting; Associate members; Computer science; Context modeling; Entropy; Fuzzy sets; Glass; Natural languages; Performance evaluation; Smoothing methods; Training data;
fLanguage :
English
Journal_Title :
Speech and Audio Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1063-6676
Type :
jour
DOI :
10.1109/89.817452
Filename :
817452
Link To Document :
بازگشت