• DocumentCode
    2267847
  • Title

    Classifying Documents with Maximum Likelihood Approximation of the Dirichlet Multinomial Gibbs Model

  • Author

    Zhou, Shibin ; Cao, Zhao ; Liu, Yushu

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Beijing Inst. of Technol., Beijing
  • Volume
    3
  • fYear
    2008
  • fDate
    20-22 Dec. 2008
  • Firstpage
    71
  • Lastpage
    75
  • Abstract
    In the text analysis, the Dirichlet compound multinomial (DCM)distribution has recently been shown to be a good model for documents because it captures the phenomenon of word burstiness, unlike the standard multinomial distribution. In this paper, for the sake of improving performance of modeling documents, we propose a variant of DCM and Gibbs distribution called Dirichlet multinomial Gibbs (DMG) model by introducing Gibbs parameters to DCM distribution. We demonstrate the maximum likelihood procedure of the DMG model with these Gibbs parameters. By our experiments, the DMG approach inherit the merits of methods of Gibbs distribution approximation and DCM estimation. More specifically, as revealed by our experimental results on various real-world text datasets, we show that maximum likelihood approximation of the DMG model is more desirable than some current state-of-the-art methods.
  • Keywords
    classification; maximum likelihood estimation; statistical distributions; text analysis; DMG model; Dirichlet compound multinomial distribution; Dirichlet multinomial Gibbs model; document classification; maximum likelihood approximation; text analysis; word burstiness phenomenon; Application software; Approximation methods; Computer science; Entropy; Frequency; Information technology; Maximum likelihood estimation; Testing; Text analysis; Text categorization; Document classification; Maximum Likelihood;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Information Technology Application, 2008. IITA '08. Second International Symposium on
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-0-7695-3497-8
  • Type

    conf

  • DOI
    10.1109/IITA.2008.307
  • Filename
    4739961