• DocumentCode
    2369763
  • Title

    Combining multiple weak clusterings

  • Author

    Topchy, Alexander ; Jain, Anil K. ; Punch, William

  • Author_Institution
    Dept. of Comput. Sci., Michigan State Univ., East Lansing, MI, USA
  • fYear
    2003
  • fDate
    19-22 Nov. 2003
  • Firstpage
    331
  • Lastpage
    338
  • Abstract
    A data set can be clustered in many ways depending on the clustering algorithm employed, parameter settings used and other factors. Can multiple clusterings be combined so that the final partitioning of data provides better clustering? The answer depends on the quality of clusterings to be combined as well as the properties of the fusion method. First, we introduce a unified representation for multiple clusterings and formulate the corresponding categorical clustering problem. As a result, we show that the consensus function is related to the classical intra-class variance criterion using the generalized mutual information definition. Second, we show the efficacy of combining partitions generated by weak clustering algorithms that use data projections and random data splits. A simple explanatory model is offered for the behavior of combinations of such weak clustering components. We analyze the combination accuracy as a function of parameters controlling the power and resolution of component partitions as well as the learning dynamics vs. the number of clusterings involved. Finally, some empirical studies compare the effectiveness of several consensus functions.
  • Keywords
    data mining; learning (artificial intelligence); pattern clustering; statistical analysis; categorical clustering problem; component partition; consensus function; data projection; data set; fusion method property; intra-class variance criterion; learning dynamics; multiple weak clustering algorithm; mutual information definition; parameter setting; random data split; Classification algorithms; Clustering algorithms; Computer science; Data mining; Fusion power generation; Mutual information; Partitioning algorithms; Robustness; Taxonomy; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2003. ICDM 2003. Third IEEE International Conference on
  • Print_ISBN
    0-7695-1978-4
  • Type

    conf

  • DOI
    10.1109/ICDM.2003.1250937
  • Filename
    1250937