DocumentCode :
2764953
Title :
Multi-source kernel k-means for clustering heterogeneous biomedical data
Author :
Phoungphol, Piyaphol ; Zhang, Yanqing
Author_Institution :
Dept. of Comput. Sci., Georgia State Univ., Atlanta, GA, USA
fYear :
2011
fDate :
12-15 Nov. 2011
Firstpage :
223
Lastpage :
228
Abstract :
In recent years, huge different biological databases have been stored in various locations. Using distinct data sets from multiple sources results in more reliable data analysis. However, it is so difficult to combine heterogeneous data in one single server. The most obvious reasons include data privacy, large data sizes, costs, and different geographical locations of data sources. In this paper, we present two new algorithms for clustering data from multiple remote data sources using kernel k-means. The first algorithm is the center-based algorithm built on k-means algorithm. The second algorithm uses distributed kernel k-means over multiple data sources. In the distributed scheme, clustering methods are executed only on their local data sources themselves. Partial clustering results are synched between data sources. To evaluate performance of our proposed algorithms, we merged all data from different sources into one large data set to perform kernel k-means. The results showed that our center-based algorithm greatly reduced transmission data between data sources while still yielding acceptable clustering results. Our distributed kernel k-means algorithm achieved even better performance. The clustering results are very close to those generated by kernel k-means on one merged data set.
Keywords :
biology computing; data analysis; distributed processing; pattern clustering; biological databases; center-based algorithm; cost; data analysis; data privacy; distributed kernel k-means; geographical locations; heterogeneous biomedical data; large data sizes; multiple data sources; multisource kernel k-means clustering; partial clustering; Approximation algorithms; Clustering algorithms; Distributed algorithms; Distributed databases; Kernel; Servers; biomedical data; k-means; kernel k-means; multi-source data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics and Biomedicine Workshops (BIBMW), 2011 IEEE International Conference on
Conference_Location :
Atlanta, GA
Print_ISBN :
978-1-4577-1612-6
Type :
conf
DOI :
10.1109/BIBMW.2011.6112378
Filename :
6112378
Link To Document :
بازگشت