• DocumentCode
    1384093
  • Title

    Count Data Modeling and Classification Using Finite Mixtures of Distributions

  • Author

    Bouguila, Nizar

  • Author_Institution
    Concordia Inst. for Inf. Syst. Eng., Concordia Univ., Montreal, QC, Canada
  • Volume
    22
  • Issue
    2
  • fYear
    2011
  • Firstpage
    186
  • Lastpage
    198
  • Abstract
    In this paper, we consider the problem of constructing accurate and flexible statistical representations for count data, which we often confront in many areas such as data mining, computer vision, and information retrieval. In particular, we analyze and compare several generative approaches widely used for count data clustering, namely multinomial, multinomial Dirichlet, and multinomial generalized Dirichlet mixture models. Moreover, we propose a clustering approach via a mixture model based on a composition of the Liouville family of distributions, from which we select the Beta-Liouville distribution, and the multinomial. The novel proposed model, which we call multinomial Beta-Liouville mixture, is optimized by deterministic annealing expectation-maximization and minimum description length, and strives to achieve a high accuracy of count data clustering and model selection. An important feature of the multinomial Beta-Liouville mixture is that it has fewer parameters than the recently proposed multinomial generalized Dirichlet mixture. The performance evaluation is conducted through a set of extensive empirical experiments, which concern text and image texture modeling and classification and shape modeling, and highlights the merits of the proposed models and approaches.
  • Keywords
    Liouville equation; expectation-maximisation algorithm; feature extraction; image texture; pattern clustering; solid modelling; text analysis; Beta Liouville distribution; count data modeling; data classification; data clustering; expectation maximization approach; finite distribution mixture; image classification; image texture modeling; multinomial generalized Dirichlet mixture; shape modeling; Annealing; Computational modeling; Data mining; Data models; Equations; Shape; Count data; Dirichlet; Fisher kernel; Liouville; deterministic annealing expectation-maximization; finite mixture models; generalized Dirichlet; model selection; multinomial; shape modeling; support vector machine; text categorization; texture classification; Algorithms; Artificial Intelligence; Automatic Data Processing; Computer Simulation; Data Mining; Humans; Mathematical Concepts; Models, Theoretical; Neural Networks (Computer); Pattern Recognition, Automated;
  • fLanguage
    English
  • Journal_Title
    Neural Networks, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9227
  • Type

    jour

  • DOI
    10.1109/TNN.2010.2091428
  • Filename
    5640674