• DocumentCode
    3105556
  • Title

    Stability Region Based Expectation Maximization for Model-based Clustering

  • Author

    Reddy, Chandan K. ; Chiang, Hsiao-Dong ; Rajaratnam, Bala

  • Author_Institution
    Sch. of Electr. & Comput. Eng., Cornell Univ., Ithaca, NY
  • fYear
    2006
  • fDate
    18-22 Dec. 2006
  • Firstpage
    522
  • Lastpage
    531
  • Abstract
    In spite of the initialization problem, the expectation-maximization (EM) algorithm is widely used for estimating the parameters in several data mining related tasks. Most popular model-based clustering techniques might yield poor clusters if the parameters are not initialized properly. To reduce the sensitivity of initial points, a novel algorithm for learning mixture models from multivariate data is introduced in this paper. The proposed algorithm takes advantage of TRUST-TECH (TRansformation Under STability- reTaining Equilibra CHaracterization) to compute neighborhood local maxima on likelihood surface using stability regions. Basically, our method coalesces the advantages of the traditional EM with that of the dynamic and geometric characteristics of the stability regions of the corresponding nonlinear dynamical system of the log-likelihood function. Two phases namely, the EM phase and the stability region phase, are repeated alternatively in the parameter space to achieve improvements in the maximum likelihood. Though applied to Gaussian mixtures in this paper, our technique can be easily generalized to any other parametric finite mixture model. The algorithm has been tested on both synthetic and real datasets and the improvements in the performance compared to other approaches are demonstrated. The robustness with respect to initialization is also illustrated experimentally.
  • Keywords
    Gaussian processes; data mining; expectation-maximisation algorithm; learning (artificial intelligence); parameter estimation; pattern clustering; Gaussian mixture model learning algorithm; data mining; expectation-maximization algorithm; likelihood surface; log-likelihood function; model-based clustering; multivariate data; neighborhood local maxima; nonlinear dynamical system; parameter estimation; stability region phase; stability-retaining equilibria characterization; Clustering algorithms; Data mining; Maximum likelihood estimation; Newton method; Nonlinear dynamical systems; Parameter estimation; Robustness; Stability; Stochastic processes; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2006. ICDM '06. Sixth International Conference on
  • Conference_Location
    Hong Kong
  • ISSN
    1550-4786
  • Print_ISBN
    0-7695-2701-7
  • Type

    conf

  • DOI
    10.1109/ICDM.2006.152
  • Filename
    4053078