• DocumentCode
    445820
  • Title

    Multinomial PCA for extracting major latent topics from document streams

  • Author

    Kimura, Masahiro ; Saito, Kazumi ; Ueda, Naonori

  • Author_Institution
    NTT Commun. Sci. Labs, Kyoto, Japan
  • Volume
    1
  • fYear
    2005
  • fDate
    31 July-4 Aug. 2005
  • Firstpage
    238
  • Abstract
    We propose a new unsupervised learning method called multinomial PCA (MuPCA) for efficiently extracting the major latent topics from a document stream based on the "bag-of-words" (BOW) representation of a document. Unlike PCA, MuPCA follows a suitable probabilistic generative model for the document stream represented as time-series of word-frequency vectors. Using real data of document streams on the Web, we experimentally demonstrate the effectiveness of the proposed method.
  • Keywords
    document handling; principal component analysis; unsupervised learning; bag-of-words representation; document stream; document streams; latent topic extraction; multinomial PCA; probabilistic generative model; unsupervised learning; word-frequency vectors; Blogs; Computational intelligence; Data mining; Frequency; Gaussian distribution; Humans; Natural language processing; Principal component analysis; Unsupervised learning; Web sites;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks, 2005. IJCNN '05. Proceedings. 2005 IEEE International Joint Conference on
  • Print_ISBN
    0-7803-9048-2
  • Type

    conf

  • DOI
    10.1109/IJCNN.2005.1555836
  • Filename
    1555836