DocumentCode
2009566
Title
Smoothing of ngram language models of human chats
Author
Dumoulin, J.
fYear
2012
fDate
20-24 Nov. 2012
Firstpage
1
Lastpage
4
Abstract
Ngram language models are ubiquitous in speech applications and many other natural language systems. One issue with n-gram language models is that the language is not completely represented in the model. When words appear that are not in the model, we may need to provide a smoothing method to distribute the model probabilities over the unknown values. Many techniques exist for language model smoothing with many different performance characteristics. Often the performance of smoothing algorithms may depend on the application of the language model (so, for example, unigram models with interpolation smoothing may perform better with information retrieval applications, but trigram models with backoff smoothing might perform better for speech). This paper examines the relative performance of some selected smoothing methods with bigram language models created using chat data. The language models are used for machine translation of chat data and for creating text classification models.
Keywords
language translation; natural language processing; pattern classification; smoothing methods; text analysis; backoff smoothing; bigram language models; chat data; human chats; information retrieval applications; interpolation smoothing; language model smoothing; model probabilities; natural language systems; ngram language models; smoothing algorithms; speech applications; text classification models; trigram models; unigram models;
fLanguage
English
Publisher
ieee
Conference_Titel
Soft Computing and Intelligent Systems (SCIS) and 13th International Symposium on Advanced Intelligent Systems (ISIS), 2012 Joint 6th International Conference on
Conference_Location
Kobe
Print_ISBN
978-1-4673-2742-8
Type
conf
DOI
10.1109/SCIS-ISIS.2012.6505411
Filename
6505411
Link To Document