Title :
Combining Parallel Self-Organizing Maps and K-Means to Cluster Distributed Data
Author :
Gorgônio, Flavius L. ; Costa, José Alfredo F
Author_Institution :
Fed. Univ. of Rio Grande do Norte, Natal
Abstract :
Clustering is the process of discovering groups within multidimensional data, based on similarities, with a minimal knowledge of their structure. In previous works, we presented an algorithm (partSOM) to cluster distributed datasets, based on self-organizing maps (SOM). This work extends this approach presenting a strategy for efficient cluster analysis in distributed databases using SOM and K-means. The proposed strategy applies SOM algorithm separately in each distributed dataset, relative to database vertical partitions, to obtain a representative subset of each local dataset. In the sequence, these representative subsets are sent to a central site, which performs a fusion of the partial results and applies SOM and K-means algorithms to obtain a final result. Experimental results are compared with traditional SOM and partSOM approaches for different datasets.
Keywords :
data handling; distributed databases; pattern clustering; self-organising feature maps; database vertical partitions; distributed data clustering; distributed databases; k-means; multidimensional data; parallel self-organizing maps; partSOM; Clustering algorithms; Data analysis; Data mining; Data privacy; Distributed databases; Partitioning algorithms; Pattern analysis; Self organizing feature maps; Signal analysis; Storage automation; Distributed data mining; distributed data clustering; self-organizing maps;
Conference_Titel :
Computational Science and Engineering Workshops, 2008. CSEWORKSHOPS '08. 11th IEEE International Conference on
Conference_Location :
San Paulo
Print_ISBN :
978-0-7695-3257-8
DOI :
10.1109/CSEW.2008.65