• DocumentCode
    478740
  • Title

    Validation Measures for Clustering Algorithms Incorporating Biological Information

  • Author

    Datta, Soupayan ; Datta, Soupayan

  • Author_Institution
    Sch. of Public Health & Inf. Sci., Louisville Univ., KY
  • Volume
    1
  • fYear
    2006
  • fDate
    20-24 June 2006
  • Firstpage
    131
  • Lastpage
    135
  • Abstract
    A cluster analysis is the most commonly performed procedure (often regarded as a first step) on a set of gene expression profiles. A closely related problem is that of selecting a clustering algorithm that is optimal in some way from a rather impressive list of clustering algorithms that currently exist. In this paper, we propose two validation measures each with two parts: one measuring the statistical consistency (stability) of the clusters produced and the other representing their biological functional consistency, so that a good clustering algorithm should have a small value for these measures. We illustrate our methods using two sets of expression profiles obtained from a breast cancer data set. Six well known clustering algorithms UPGMA, k-means, Diana, Fanny, model-based and SOM were evaluated. Whereas the exact ordering depends on the particular data set (expression profiles) used and the validation measure employed, overall UPGMA appears to be the optimal for this cancer data set that we considered
  • Keywords
    biology computing; cancer; data handling; genetics; pattern clustering; statistical analysis; UPGMA; biological functional consistency; biological information; breast cancer data set; clustering algorithm; gene expression profiles; statistical consistency; Biological system modeling; Biology; Breast cancer; Clustering algorithms; Clustering methods; Gene expression; Information analysis; Performance analysis; Public healthcare; Stability;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer and Computational Sciences, 2006. IMSCCS '06. First International Multi-Symposiums on
  • Conference_Location
    Hanzhou, Zhejiang
  • Print_ISBN
    0-7695-2581-4
  • Type

    conf

  • DOI
    10.1109/IMSCCS.2006.139
  • Filename
    4673536