DocumentCode
2772194
Title
Dirichlet Mixture Allocation for Multiclass Document Collections Modeling
Author
Bian, Wei ; Tao, Dacheng
Author_Institution
Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore, Singapore
fYear
2009
fDate
6-9 Dec. 2009
Firstpage
711
Lastpage
715
Abstract
Topic model, latent Dirichlet allocation (LDA), is an effective tool for statistical analysis of large collections of documents. In LDA, each document is modeled as a mixture of topics and the topic proportions are generated from the unimodal Dirichlet distribution prior. When a collection of documents are drawn from multiple classes, this unimodal prior is insufficient for data fitting. To solve this problem, we exploit the multimodal Dirichlet mixture prior, and propose the Dirichlet mixture allocation (DMA). We report experiments on the popular TDT2 Corpus demonstrating that DMA models a collection of documents more precisely than LDA when the documents are obtained from multiple classes.
Keywords
statistical analysis; text analysis; Dirichlet mixture allocation; TDT2 Corpus; data fitting; latent Dirichlet allocation; multiclass document collections modeling; multimodal Dirichlet mixture prior; statistical analysis; text modeling; unimodal Dirichlet distribution prior; Bayesian methods; Data engineering; Data mining; Image retrieval; Indexing; Inference algorithms; Information retrieval; Linear discriminant analysis; Statistical analysis; Vocabulary; Dirichlet mixture; latent Dirichlet allocation; multiclass; text modeling; topic model;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining, 2009. ICDM '09. Ninth IEEE International Conference on
Conference_Location
Miami, FL
ISSN
1550-4786
Print_ISBN
978-1-4244-5242-2
Electronic_ISBN
1550-4786
Type
conf
DOI
10.1109/ICDM.2009.102
Filename
5360299
Link To Document