DocumentCode
478740
Title
Validation Measures for Clustering Algorithms Incorporating Biological Information
Author
Datta, Soupayan ; Datta, Soupayan
Author_Institution
Sch. of Public Health & Inf. Sci., Louisville Univ., KY
Volume
1
fYear
2006
fDate
20-24 June 2006
Firstpage
131
Lastpage
135
Abstract
A cluster analysis is the most commonly performed procedure (often regarded as a first step) on a set of gene expression profiles. A closely related problem is that of selecting a clustering algorithm that is optimal in some way from a rather impressive list of clustering algorithms that currently exist. In this paper, we propose two validation measures each with two parts: one measuring the statistical consistency (stability) of the clusters produced and the other representing their biological functional consistency, so that a good clustering algorithm should have a small value for these measures. We illustrate our methods using two sets of expression profiles obtained from a breast cancer data set. Six well known clustering algorithms UPGMA, k-means, Diana, Fanny, model-based and SOM were evaluated. Whereas the exact ordering depends on the particular data set (expression profiles) used and the validation measure employed, overall UPGMA appears to be the optimal for this cancer data set that we considered
Keywords
biology computing; cancer; data handling; genetics; pattern clustering; statistical analysis; UPGMA; biological functional consistency; biological information; breast cancer data set; clustering algorithm; gene expression profiles; statistical consistency; Biological system modeling; Biology; Breast cancer; Clustering algorithms; Clustering methods; Gene expression; Information analysis; Performance analysis; Public healthcare; Stability;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer and Computational Sciences, 2006. IMSCCS '06. First International Multi-Symposiums on
Conference_Location
Hanzhou, Zhejiang
Print_ISBN
0-7695-2581-4
Type
conf
DOI
10.1109/IMSCCS.2006.139
Filename
4673536
Link To Document