DocumentCode :
2068783
Title :
Statistical consensus method for cluster ensembles
Author :
Deus, Clement ; Liao, Zhifang
Author_Institution :
Sch. of Inf. Sci. & Eng., Central South Univ., Changsha, China
Volume :
1
fYear :
2010
fDate :
10-12 Dec. 2010
Firstpage :
185
Lastpage :
189
Abstract :
In Data mining and Knowledge discovery, clustering is one of the most important techniques in the process of discovering salient structures from the data. This paper explores the idea of statistical consensus method for combining results from multiple clustering or partitions. We explored this idea when working with customs data from Revenue Authority. The partitions are generated by running k-means algorithm several times which produces diverse clustering results with different parameter initializations or subspaces in each time from the same data. To achieve the combination for the final clustering result, our algorithm first selects a Reference partition with best clustering results among created partitions. Then it selects partitions which are consistent by employing the Mutual Information between partitions as the selection criteria. The partitions with mutual information less than a set threshold value are discarded from the ensemble. Finally the selected partitions that create the ensemble are combined by the consensus function to achieve the final clustering results. Our consensus function uses the original features of the dataset in collaboration with the partitions results to attain the final clustering. Experiments shows that our algorithm achieves better clustering results than the classical k-means algorithm in terms of accuracy from both synthetic and real datasets.
Keywords :
data mining; pattern clustering; set theory; statistical analysis; cluster ensembles; consensus function; data mining; k-mean algorithm; knowledge discovery; multiple clustering; reference partition; revenue authority; statistical consensus method; Blood; Diabetes; Iris; Noise; Clustering; Data Mining; Ensembles; Statistical Consensus; Unsupervised learning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Progress in Informatics and Computing (PIC), 2010 IEEE International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-1-4244-6788-4
Type :
conf
DOI :
10.1109/PIC.2010.5687411
Filename :
5687411
Link To Document :
بازگشت