• DocumentCode
    3661136
  • Title

    Impact of different metrics on multi-view clustering

  • Author

    Angela Serra;Dario Greco;Roberto Tagliaferri

  • Author_Institution
    NeuRoNe Lab, Computer Science Department, University of Salerno, Via G. Paolo II, 132, Fisciano, Italy
  • fYear
    2015
  • fDate
    7/1/2015 12:00:00 AM
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    Clustering of patients allows to find groups of subjects with similar characteristics. This categorization can facilitate diagnosis, treatment decision and prognosis prediction. Heterogeneous genome-wide data sources capture different biological aspects that can be integrated in order to better categorize the patients. Clustering methods work by comparing how patients are similar or dissimilar in a suitable similarity space. While several clustering methods have been proposed, there is no systematic comparative study concerning the impact of similarity metrics on the cluster quality. We compared seven popular similarity measures (Pearson, Spearman and Kendall Correlations; Euclidean, Canberra, Minkowski and Manhattan Distances) in conjunction with two classical single-view clustering algorithms and a late integration approach (partitioning around medoids, hierarchical clustering and matrix factorization approaches), on high dimensional multi-view cancer data coming from the TCGA repository. Performance was measured against tumour subcategories classification. Only Euclidean and Minkowski distances showed similar results in terms of clustering similarity indexes. On the other hand, an absolute best similarity measure did not emerge in terms of misclassification, but it strongly depends on the data.
  • Keywords
    "Occupational health","Genomics","Bioinformatics"
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks (IJCNN), 2015 International Joint Conference on
  • Electronic_ISBN
    2161-4407
  • Type

    conf

  • DOI
    10.1109/IJCNN.2015.7280445
  • Filename
    7280445