Title :
A Joint Topical N-Gram Language Model Based on LDA
Author :
Lin, Xiaojun ; Li, Dan ; Wu, Xihong
Author_Institution :
Speech & Hearing Res. Center, Peking Univ., Beijing, China
Abstract :
In this paper, we propose a novel joint topical n-gram language model that combines the semantic topic information with local constraints in the training procedure. Instead of training the n-gram language model and topic model independently, we estimate the joint probability of latent semantic topic and n-gram directly. In this procedure Latent Dirichlet allocation (LDA) is employed to compute latent topic distributions for sentence instances. Not only does our model capture the long-range dependencies, it also distinguishes the probability distribution of each n-gram in different topics without leading to the problem of data sparseness. Experiments show that our model can lower the perplexity significantly and it is robust on topic numbers and training data scales.
Keywords :
formal languages; statistical distributions; LDA; joint topical n-gram language model; latent Dirichlet allocation; latent topic distributions; probability distribution; semantic topic information; Adaptation model; Auditory system; Computer science; Computer science education; Context modeling; Distributed computing; Linear discriminant analysis; Natural languages; Probability distribution; Speech processing;
Conference_Titel :
Intelligent Systems and Applications (ISA), 2010 2nd International Workshop on
Conference_Location :
Wuhan
Print_ISBN :
978-1-4244-5872-1
Electronic_ISBN :
978-1-4244-5874-5
DOI :
10.1109/IWISA.2010.5473439