Title :
Document clustering using mixture model of von Mises-Fisher distributions on document manifold
Author :
Nguyen Kim Anh ; Nguyen The Tam ; Ngo Van Linh
Author_Institution :
Hanoi Univ. of Sci. & Technol., Hanoi, Vietnam
Abstract :
Document clustering has become an increasingly important technique for unsupervised document organization, automatic topic extraction, and fast information retrieval or filtering. The generative model for document clustering based on the von Mises-Fisher (vMF) distribution generally produces better clustering results than other generative models. However, in fact, it is more natural and reasonable to assume that the document space is a manifold and the probability distribution that generates the data is supported on a document manifold. In this paper, we propose a regularized probabilistic model based on manifold structure for data clustering, called Laplacian regularized vMF Mixture Model (LapvMFs), which explicitly considers the manifold structure. We have developed a generalized mean-field variational inference algorithm for the LapvMFs. Extensive experimental results on a large number of high dimensional text datasets demonstrate that our approach outperforms the three state-of-the-art clustering algorithms.
Keywords :
data mining; mixture models; pattern clustering; statistical distributions; text analysis; Laplacian regularized vMF mixture model; LapvMF; document clustering; document manifold; probability distribution; text mining; von Mises-Fisher distribution; Clustering algorithms; Data models; Equations; Laplace equations; Manifolds; Mathematical model; Vectors; Probabilistic graphical model; clustering; graph laplacian; manifold; variational inference;
Conference_Titel :
Soft Computing and Pattern Recognition (SoCPaR), 2013 International Conference of
Conference_Location :
Hanoi
Print_ISBN :
978-1-4799-3399-0
DOI :
10.1109/SOCPAR.2013.7054116