Title :
Smoothing of ngram language models of human chats
Abstract :
Ngram language models are ubiquitous in speech applications and many other natural language systems. One issue with n-gram language models is that the language is not completely represented in the model. When words appear that are not in the model, we may need to provide a smoothing method to distribute the model probabilities over the unknown values. Many techniques exist for language model smoothing with many different performance characteristics. Often the performance of smoothing algorithms may depend on the application of the language model (so, for example, unigram models with interpolation smoothing may perform better with information retrieval applications, but trigram models with backoff smoothing might perform better for speech). This paper examines the relative performance of some selected smoothing methods with bigram language models created using chat data. The language models are used for machine translation of chat data and for creating text classification models.
Keywords :
language translation; natural language processing; pattern classification; smoothing methods; text analysis; backoff smoothing; bigram language models; chat data; human chats; information retrieval applications; interpolation smoothing; language model smoothing; model probabilities; natural language systems; ngram language models; smoothing algorithms; speech applications; text classification models; trigram models; unigram models;
Conference_Titel :
Soft Computing and Intelligent Systems (SCIS) and 13th International Symposium on Advanced Intelligent Systems (ISIS), 2012 Joint 6th International Conference on
Conference_Location :
Kobe
Print_ISBN :
978-1-4673-2742-8
DOI :
10.1109/SCIS-ISIS.2012.6505411