Impact of different metrics on multi-view clustering

Author

Angela Serra;Dario Greco;Roberto Tagliaferri

Author_Institution

NeuRoNe Lab, Computer Science Department, University of Salerno, Via G. Paolo II, 132, Fisciano, Italy

fYear

2015

fDate

7/1/2015 12:00:00 AM

Firstpage

1

Lastpage

8

Abstract

Clustering of patients allows to find groups of subjects with similar characteristics. This categorization can facilitate diagnosis, treatment decision and prognosis prediction. Heterogeneous genome-wide data sources capture different biological aspects that can be integrated in order to better categorize the patients. Clustering methods work by comparing how patients are similar or dissimilar in a suitable similarity space. While several clustering methods have been proposed, there is no systematic comparative study concerning the impact of similarity metrics on the cluster quality. We compared seven popular similarity measures (Pearson, Spearman and Kendall Correlations; Euclidean, Canberra, Minkowski and Manhattan Distances) in conjunction with two classical single-view clustering algorithms and a late integration approach (partitioning around medoids, hierarchical clustering and matrix factorization approaches), on high dimensional multi-view cancer data coming from the TCGA repository. Performance was measured against tumour subcategories classification. Only Euclidean and Minkowski distances showed similar results in terms of clustering similarity indexes. On the other hand, an absolute best similarity measure did not emerge in terms of misclassification, but it strongly depends on the data.

Keywords

"Occupational health","Genomics","Bioinformatics"

Publisher

ieee

Conference_Titel

Neural Networks (IJCNN), 2015 International Joint Conference on

Electronic_ISBN

2161-4407

Type

conf

DOI

10.1109/IJCNN.2015.7280445

Filename

7280445