Title :
HLDA based text clustering
Author :
Pingan Liu ; Lei Li ; Wei Heng ; Boyuan Wang
Author_Institution :
Center for Intell. Sci. & Technol., Beijing Univ. of Posts & Telecommun., Beijing, China
fDate :
Oct. 30 2012-Nov. 1 2012
Abstract :
LDA (Latent Dirichlet Allocation) topic model has been applied into many applications in recent years. But LDA has a shortcoming that it cannot deal with various changes of data set well, which has become a limitation for its applications. Hierarchical Latent Dirichlet Allocation (hLDA) is a generalization of LDA and it can adapt itself to the growing data set automatically. hLDA can mine latent topics from a large amount of discrete data and organize these topics into a hierarchy, in which the topics of higher level are more abstractive while the topics of lower level are more specific. This hierarchy could achieve a deeper semantic model which is similar with human mind. Given a set of documents, hLDA generates a prior distribution of Bayesian nonparametrics using a nested Chinese restaurant process (nCRP)[1]. The documents sharing similar topics are organized into a cluster of path. hLDA learns the distribution of topics using a method of Bayesian posterior inference. This paper tries to study hLDA model in details and apply it into the application of Chinese text clustering. Experiments have shown that hLDA is a very promising model for text clustering.
Keywords :
Bayes methods; natural language processing; nonparametric statistics; pattern clustering; statistical distributions; text analysis; Bayesian nonparametric distribution; Bayesian posterior inference; Chinese text clustering; HLDA based text clustering; LDA topic model; discrete data; document sharing; hierarchical Latent Dirichlet allocation; nCRP; nested Chinese restaurant process; semantic model; topic distribution; Bayes methods; Educational institutions; Entropy; Fitting; Large scale integration; Markov processes; Resource management; Bayesian nonparametrics; hierarchical latent dirichlet allocation (HLDA); nested chinese restaurant process (ncrp); text clustering;
Conference_Titel :
Cloud Computing and Intelligent Systems (CCIS), 2012 IEEE 2nd International Conference on
Conference_Location :
Hangzhou
Print_ISBN :
978-1-4673-1855-6
DOI :
10.1109/CCIS.2012.6664628