• DocumentCode
    1206099
  • Title

    Clustering ensembles: models of consensus and weak partitions

  • Author

    Topchy, Alexander ; Jain, Anil K. ; Punch, William

  • Author_Institution
    Nielsen Media Res., Oldsmar, FL, USA
  • Volume
    27
  • Issue
    12
  • fYear
    2005
  • Firstpage
    1866
  • Lastpage
    1881
  • Abstract
    Clustering ensembles have emerged as a powerful method for improving both the robustness as well as the stability of unsupervised classification solutions. However, finding a consensus clustering from multiple partitions is a difficult problem that can be approached from graph-based, combinatorial, or statistical perspectives. This study extends previous research on clustering ensembles in several respects. First, we introduce a unified representation for multiple clusterings and formulate the corresponding categorical clustering problem. Second, we propose a probabilistic model of consensus using a finite mixture of multinomial distributions in a space of clusterings. A combined partition is found as a solution to the corresponding maximum-likelihood problem using the EM algorithm. Third, we define a new consensus function that is related to the classical intraclass variance criterion using the generalized mutual information definition. Finally, we demonstrate the efficacy of combining partitions generated by weak clustering algorithms that use data projections and random data splits. A simple explanatory model is offered for the behavior of combinations of such weak clustering components. Combination accuracy is analyzed as a function of several parameters that control the power and resolution of component partitions as well as the number of partitions. We also analyze clustering ensembles with incomplete information and the effect of missing cluster labels on the quality of overall consensus. Experimental results demonstrate the effectiveness of the proposed methods on several real-world data sets.
  • Keywords
    maximum likelihood estimation; pattern classification; pattern clustering; statistical distributions; clustering ensembles; consensus clustering; maximum-likelihood problem; multinomial distributions; unsupervised classification; weak partitions; Clustering algorithms; Data models; Data visualization; Information analysis; Mutual information; Noise robustness; Partitioning algorithms; Robust stability; Sampling methods; Uncertainty; Index Terms- Clustering; consensus function; ensembles; multiple classifier systems; mutual information.; Algorithms; Artificial Intelligence; Cluster Analysis; Computer Simulation; Image Enhancement; Image Interpretation, Computer-Assisted; Imaging, Three-Dimensional; Models, Biological; Models, Statistical; Pattern Recognition, Automated; Reproducibility of Results; Sensitivity and Specificity;
  • fLanguage
    English
  • Journal_Title
    Pattern Analysis and Machine Intelligence, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0162-8828
  • Type

    jour

  • DOI
    10.1109/TPAMI.2005.237
  • Filename
    1524981