• DocumentCode
    2068783
  • Title

    Statistical consensus method for cluster ensembles

  • Author

    Deus, Clement ; Liao, Zhifang

  • Author_Institution
    Sch. of Inf. Sci. & Eng., Central South Univ., Changsha, China
  • Volume
    1
  • fYear
    2010
  • fDate
    10-12 Dec. 2010
  • Firstpage
    185
  • Lastpage
    189
  • Abstract
    In Data mining and Knowledge discovery, clustering is one of the most important techniques in the process of discovering salient structures from the data. This paper explores the idea of statistical consensus method for combining results from multiple clustering or partitions. We explored this idea when working with customs data from Revenue Authority. The partitions are generated by running k-means algorithm several times which produces diverse clustering results with different parameter initializations or subspaces in each time from the same data. To achieve the combination for the final clustering result, our algorithm first selects a Reference partition with best clustering results among created partitions. Then it selects partitions which are consistent by employing the Mutual Information between partitions as the selection criteria. The partitions with mutual information less than a set threshold value are discarded from the ensemble. Finally the selected partitions that create the ensemble are combined by the consensus function to achieve the final clustering results. Our consensus function uses the original features of the dataset in collaboration with the partitions results to attain the final clustering. Experiments shows that our algorithm achieves better clustering results than the classical k-means algorithm in terms of accuracy from both synthetic and real datasets.
  • Keywords
    data mining; pattern clustering; set theory; statistical analysis; cluster ensembles; consensus function; data mining; k-mean algorithm; knowledge discovery; multiple clustering; reference partition; revenue authority; statistical consensus method; Blood; Diabetes; Iris; Noise; Clustering; Data Mining; Ensembles; Statistical Consensus; Unsupervised learning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Progress in Informatics and Computing (PIC), 2010 IEEE International Conference on
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-1-4244-6788-4
  • Type

    conf

  • DOI
    10.1109/PIC.2010.5687411
  • Filename
    5687411