• DocumentCode
    1446334
  • Title

    Efficient Clustering Aggregation Based on Data Fragments

  • Author

    Wu, Ou ; Hu, Weiming ; Maybank, Stephen J. ; Zhu, Mingliang ; Li, Bing

  • Author_Institution
    Nat. Lab. of Pattern Recognition, Inst. of Autom., Beijing, China
  • Volume
    42
  • Issue
    3
  • fYear
    2012
  • fDate
    6/1/2012 12:00:00 AM
  • Firstpage
    913
  • Lastpage
    926
  • Abstract
    Clustering aggregation, known as clustering ensembles, has emerged as a powerful technique for combining different clustering results to obtain a single better clustering. Existing clustering aggregation algorithms are applied directly to data points, in what is referred to as the point-based approach. The algorithms are inefficient if the number of data points is large. We define an efficient approach for clustering aggregation based on data fragments. In this fragment-based approach, a data fragment is any subset of the data that is not split by any of the clustering results. To establish the theoretical bases of the proposed approach, we prove that clustering aggregation can be performed directly on data fragments under two widely used goodness measures for clustering aggregation taken from the literature. Three new clustering aggregation algorithms are described. The experimental results obtained using several public data sets show that the new algorithms have lower computational complexity than three well-known existing point-based clustering aggregation algorithms (Agglomerative, Furthest, and LocalSearch); nevertheless, the new algorithms do not sacrifice the accuracy.
  • Keywords
    computational complexity; pattern clustering; clustering ensembles; computational complexity; data fragments; data points; efficient clustering aggregation; Clustering algorithms; Computational complexity; Correlation; Dispersion; Mutual information; Partitioning algorithms; Clustering aggregation; comparison measure; computational complexity; data fragment; fragment-based approach; mutual information; point-based approach; Algorithms; Artificial Intelligence; Cluster Analysis; Computer Simulation; Databases, Factual; Decision Support Techniques; Information Storage and Retrieval; Models, Theoretical; Pattern Recognition, Automated;
  • fLanguage
    English
  • Journal_Title
    Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1083-4419
  • Type

    jour

  • DOI
    10.1109/TSMCB.2012.2183591
  • Filename
    6151183