• DocumentCode
    2772194
  • Title

    Dirichlet Mixture Allocation for Multiclass Document Collections Modeling

  • Author

    Bian, Wei ; Tao, Dacheng

  • Author_Institution
    Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore, Singapore
  • fYear
    2009
  • fDate
    6-9 Dec. 2009
  • Firstpage
    711
  • Lastpage
    715
  • Abstract
    Topic model, latent Dirichlet allocation (LDA), is an effective tool for statistical analysis of large collections of documents. In LDA, each document is modeled as a mixture of topics and the topic proportions are generated from the unimodal Dirichlet distribution prior. When a collection of documents are drawn from multiple classes, this unimodal prior is insufficient for data fitting. To solve this problem, we exploit the multimodal Dirichlet mixture prior, and propose the Dirichlet mixture allocation (DMA). We report experiments on the popular TDT2 Corpus demonstrating that DMA models a collection of documents more precisely than LDA when the documents are obtained from multiple classes.
  • Keywords
    statistical analysis; text analysis; Dirichlet mixture allocation; TDT2 Corpus; data fitting; latent Dirichlet allocation; multiclass document collections modeling; multimodal Dirichlet mixture prior; statistical analysis; text modeling; unimodal Dirichlet distribution prior; Bayesian methods; Data engineering; Data mining; Image retrieval; Indexing; Inference algorithms; Information retrieval; Linear discriminant analysis; Statistical analysis; Vocabulary; Dirichlet mixture; latent Dirichlet allocation; multiclass; text modeling; topic model;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2009. ICDM '09. Ninth IEEE International Conference on
  • Conference_Location
    Miami, FL
  • ISSN
    1550-4786
  • Print_ISBN
    978-1-4244-5242-2
  • Electronic_ISBN
    1550-4786
  • Type

    conf

  • DOI
    10.1109/ICDM.2009.102
  • Filename
    5360299