DocumentCode
445820
Title
Multinomial PCA for extracting major latent topics from document streams
Author
Kimura, Masahiro ; Saito, Kazumi ; Ueda, Naonori
Author_Institution
NTT Commun. Sci. Labs, Kyoto, Japan
Volume
1
fYear
2005
fDate
31 July-4 Aug. 2005
Firstpage
238
Abstract
We propose a new unsupervised learning method called multinomial PCA (MuPCA) for efficiently extracting the major latent topics from a document stream based on the "bag-of-words" (BOW) representation of a document. Unlike PCA, MuPCA follows a suitable probabilistic generative model for the document stream represented as time-series of word-frequency vectors. Using real data of document streams on the Web, we experimentally demonstrate the effectiveness of the proposed method.
Keywords
document handling; principal component analysis; unsupervised learning; bag-of-words representation; document stream; document streams; latent topic extraction; multinomial PCA; probabilistic generative model; unsupervised learning; word-frequency vectors; Blogs; Computational intelligence; Data mining; Frequency; Gaussian distribution; Humans; Natural language processing; Principal component analysis; Unsupervised learning; Web sites;
fLanguage
English
Publisher
ieee
Conference_Titel
Neural Networks, 2005. IJCNN '05. Proceedings. 2005 IEEE International Joint Conference on
Print_ISBN
0-7803-9048-2
Type
conf
DOI
10.1109/IJCNN.2005.1555836
Filename
1555836
Link To Document