DocumentCode :
3074429
Title :
A Novel Approach for Automatic Number of Clusters Detection in Microarray Data Based on Consensus Clustering
Author :
Vinh, Nguyen Xuan ; Epps, Julien
Author_Institution :
Sch. of Electr. Eng. & Telecommun., Univ. of New South Wales, Sydney, NSW, Australia
fYear :
2009
fDate :
22-24 June 2009
Firstpage :
84
Lastpage :
91
Abstract :
Estimating the true number of clusters in a data set is one of the major challenges in cluster analysis. Yet in certain domains,knowing the true number of clusters is of high importance. For example, in medical research, detecting the true number of groups and sub-groups of cancer would be of utmost importance for their effective treatment. In this paper we propose a novel method to estimate the number of clusters in a micro array data set based on the consensus clustering approach. Although the main objective of consensus clustering is to discover a robust and high quality cluster structure in a data set, closer inspection of the set of clusterings obtained can often give valuable information about the appropriate number of clusters present. More specifically, the set off clusterings obtained when the specified number of clusters coincides with the true number of clusters tends to be less diverse.To quantify this diversity we develop a novel index, namely the Consensus Index (CI), which is built upon a suitable clustering similarity measure such as the well known Adjusted Rand Index (ARI)or our recently developed, information theoretic based index, namely the Adjusted Mutual Information (AMI). Our experiments on both synthetic and real microarray data sets indicate that the CI is a useful indicator for determining the appropriate number of clusters.
Keywords :
information theory; pattern clustering; adjusted mutual information; adjusted rand index; automatic number; cluster analysis; clustering similarity measure; clusters detection; consensus clustering approach; consensus index; high quality cluster structure; information theoretic based index; microarray data set; robust cluster structure; Bioinformatics; Biomedical engineering; Cancer detection; Clustering algorithms; Clustering methods; Inspection; Medical treatment; Mutual information; Robustness; Shape; adjusted mutual information (AMI); gene clustering; model selection; number of cluster detection;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics and BioEngineering, 2009. BIBE '09. Ninth IEEE International Conference on
Conference_Location :
Taichung
Print_ISBN :
978-0-7695-3656-9
Type :
conf
DOI :
10.1109/BIBE.2009.19
Filename :
5211310
Link To Document :
بازگشت