DocumentCode :
445820
Title :
Multinomial PCA for extracting major latent topics from document streams
Author :
Kimura, Masahiro ; Saito, Kazumi ; Ueda, Naonori
Author_Institution :
NTT Commun. Sci. Labs, Kyoto, Japan
Volume :
1
fYear :
2005
fDate :
31 July-4 Aug. 2005
Firstpage :
238
Abstract :
We propose a new unsupervised learning method called multinomial PCA (MuPCA) for efficiently extracting the major latent topics from a document stream based on the "bag-of-words" (BOW) representation of a document. Unlike PCA, MuPCA follows a suitable probabilistic generative model for the document stream represented as time-series of word-frequency vectors. Using real data of document streams on the Web, we experimentally demonstrate the effectiveness of the proposed method.
Keywords :
document handling; principal component analysis; unsupervised learning; bag-of-words representation; document stream; document streams; latent topic extraction; multinomial PCA; probabilistic generative model; unsupervised learning; word-frequency vectors; Blogs; Computational intelligence; Data mining; Frequency; Gaussian distribution; Humans; Natural language processing; Principal component analysis; Unsupervised learning; Web sites;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Neural Networks, 2005. IJCNN '05. Proceedings. 2005 IEEE International Joint Conference on
Print_ISBN :
0-7803-9048-2
Type :
conf
DOI :
10.1109/IJCNN.2005.1555836
Filename :
1555836
Link To Document :
بازگشت