DocumentCode
2481839
Title
A Joint Topical N-Gram Language Model Based on LDA
Author
Lin, Xiaojun ; Li, Dan ; Wu, Xihong
Author_Institution
Speech & Hearing Res. Center, Peking Univ., Beijing, China
fYear
2010
fDate
22-23 May 2010
Firstpage
1
Lastpage
4
Abstract
In this paper, we propose a novel joint topical n-gram language model that combines the semantic topic information with local constraints in the training procedure. Instead of training the n-gram language model and topic model independently, we estimate the joint probability of latent semantic topic and n-gram directly. In this procedure Latent Dirichlet allocation (LDA) is employed to compute latent topic distributions for sentence instances. Not only does our model capture the long-range dependencies, it also distinguishes the probability distribution of each n-gram in different topics without leading to the problem of data sparseness. Experiments show that our model can lower the perplexity significantly and it is robust on topic numbers and training data scales.
Keywords
formal languages; statistical distributions; LDA; joint topical n-gram language model; latent Dirichlet allocation; latent topic distributions; probability distribution; semantic topic information; Adaptation model; Auditory system; Computer science; Computer science education; Context modeling; Distributed computing; Linear discriminant analysis; Natural languages; Probability distribution; Speech processing;
fLanguage
English
Publisher
ieee
Conference_Titel
Intelligent Systems and Applications (ISA), 2010 2nd International Workshop on
Conference_Location
Wuhan
Print_ISBN
978-1-4244-5872-1
Electronic_ISBN
978-1-4244-5874-5
Type
conf
DOI
10.1109/IWISA.2010.5473439
Filename
5473439
Link To Document