DocumentCode
2068783
Title
Statistical consensus method for cluster ensembles
Author
Deus, Clement ; Liao, Zhifang
Author_Institution
Sch. of Inf. Sci. & Eng., Central South Univ., Changsha, China
Volume
1
fYear
2010
fDate
10-12 Dec. 2010
Firstpage
185
Lastpage
189
Abstract
In Data mining and Knowledge discovery, clustering is one of the most important techniques in the process of discovering salient structures from the data. This paper explores the idea of statistical consensus method for combining results from multiple clustering or partitions. We explored this idea when working with customs data from Revenue Authority. The partitions are generated by running k-means algorithm several times which produces diverse clustering results with different parameter initializations or subspaces in each time from the same data. To achieve the combination for the final clustering result, our algorithm first selects a Reference partition with best clustering results among created partitions. Then it selects partitions which are consistent by employing the Mutual Information between partitions as the selection criteria. The partitions with mutual information less than a set threshold value are discarded from the ensemble. Finally the selected partitions that create the ensemble are combined by the consensus function to achieve the final clustering results. Our consensus function uses the original features of the dataset in collaboration with the partitions results to attain the final clustering. Experiments shows that our algorithm achieves better clustering results than the classical k-means algorithm in terms of accuracy from both synthetic and real datasets.
Keywords
data mining; pattern clustering; set theory; statistical analysis; cluster ensembles; consensus function; data mining; k-mean algorithm; knowledge discovery; multiple clustering; reference partition; revenue authority; statistical consensus method; Blood; Diabetes; Iris; Noise; Clustering; Data Mining; Ensembles; Statistical Consensus; Unsupervised learning;
fLanguage
English
Publisher
ieee
Conference_Titel
Progress in Informatics and Computing (PIC), 2010 IEEE International Conference on
Conference_Location
Shanghai
Print_ISBN
978-1-4244-6788-4
Type
conf
DOI
10.1109/PIC.2010.5687411
Filename
5687411
Link To Document