• DocumentCode
    2850218
  • Title

    Analysis of consensus partition in cluster ensemble

  • Author

    Topchy, Alexander P. ; Law, Martin H C ; Jain, Anil K. ; Fred, Ana L.

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Michigan State Univ., USA
  • fYear
    2004
  • fDate
    1-4 Nov. 2004
  • Firstpage
    225
  • Lastpage
    232
  • Abstract
    In combination of multiple partitions, one is usually interested in deriving a consensus solution with a quality better than that of given partitions. Several recent studies have empirically demonstrated improved accuracy of clustering ensembles on a number of artificial and real-world data sets. Unlike certain multiple supervised classifier systems, convergence properties of unsupervised clustering ensembles remain unknown for conventional combination schemes. In this paper, we present formal arguments on the effectiveness of cluster ensemble from two perspectives. The first is based on a stochastic partition generation model related to re-labeling and consensus function with plurality voting. The second is to study the property of the "mean" partition of an ensemble with respect to a metric on the space of all possible partitions. In both the cases, the consensus solution can be shown to converge to a true underlying clustering solution as the number of partitions in the ensemble increases. This paper provides a rigorous justification for the use of cluster ensemble.
  • Keywords
    data mining; pattern clustering; artificial data set; cluster ensemble; consensus function; consensus partition; convergence properties; mean partition; plurality voting; real-world data set; relabeling function; stochastic partition generation model; supervised classifier systems; unsupervised clustering ensembles; Algorithm design and analysis; Clustering algorithms; Computer science; Data mining; Labeling; Mutual information; Partitioning algorithms; Robustness; Stochastic processes; Voting;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2004. ICDM '04. Fourth IEEE International Conference on
  • Print_ISBN
    0-7695-2142-8
  • Type

    conf

  • DOI
    10.1109/ICDM.2004.10100
  • Filename
    1410288