• DocumentCode
    2711038
  • Title

    Collective Latent Dirichlet Allocation

  • Author

    Shen, Zhi-Yong ; Sun, Jun ; Shen, Yi-Dong

  • Author_Institution
    State Key Lab. of Comput. Sci., Chinese Acad. of Sci., Beijing
  • fYear
    2008
  • fDate
    15-19 Dec. 2008
  • Firstpage
    1019
  • Lastpage
    1024
  • Abstract
    In this paper, we propose a new variant of latent Dirichlet allocation (LDA): Collective LDA (C-LDA), for multiple corpora modeling. C-LDA combines multiple corpora during learning such that it can transfer knowledge from one corpus to another; meanwhile it keeps a discriminative node which represents the corpus ID to constrain the learned topics in each corpus. Compared with LDA locally applied to the target corpus, C-LDA results in refined topic-word distribution, while compared with applying LDA globally and straightforwardly to the combined corpus, C-LDA keeps each topic only for one corpus. We demonstrate that C-LDA has improved performance with these advantages by experiments on several benchmark document data sets.
  • Keywords
    classification; document handling; learning (artificial intelligence); collective latent Dirichlet allocation; document classification; knowledge transfer; machine learning; multiple corpora modeling; topic-word distribution; Computer science; Content based retrieval; Data mining; Information retrieval; Laboratories; Linear discriminant analysis; Machine learning; Natural language processing; Text mining; Web pages; collective LDA;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on
  • Conference_Location
    Pisa
  • ISSN
    1550-4786
  • Print_ISBN
    978-0-7695-3502-9
  • Type

    conf

  • DOI
    10.1109/ICDM.2008.75
  • Filename
    4781218