DocumentCode
2836106
Title
Combining Parallel Self-Organizing Maps and K-Means to Cluster Distributed Data
Author
Gorgônio, Flavius L. ; Costa, José Alfredo F
Author_Institution
Fed. Univ. of Rio Grande do Norte, Natal
fYear
2008
fDate
16-18 July 2008
Firstpage
53
Lastpage
58
Abstract
Clustering is the process of discovering groups within multidimensional data, based on similarities, with a minimal knowledge of their structure. In previous works, we presented an algorithm (partSOM) to cluster distributed datasets, based on self-organizing maps (SOM). This work extends this approach presenting a strategy for efficient cluster analysis in distributed databases using SOM and K-means. The proposed strategy applies SOM algorithm separately in each distributed dataset, relative to database vertical partitions, to obtain a representative subset of each local dataset. In the sequence, these representative subsets are sent to a central site, which performs a fusion of the partial results and applies SOM and K-means algorithms to obtain a final result. Experimental results are compared with traditional SOM and partSOM approaches for different datasets.
Keywords
data handling; distributed databases; pattern clustering; self-organising feature maps; database vertical partitions; distributed data clustering; distributed databases; k-means; multidimensional data; parallel self-organizing maps; partSOM; Clustering algorithms; Data analysis; Data mining; Data privacy; Distributed databases; Partitioning algorithms; Pattern analysis; Self organizing feature maps; Signal analysis; Storage automation; Distributed data mining; distributed data clustering; self-organizing maps;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational Science and Engineering Workshops, 2008. CSEWORKSHOPS '08. 11th IEEE International Conference on
Conference_Location
San Paulo
Print_ISBN
978-0-7695-3257-8
Type
conf
DOI
10.1109/CSEW.2008.65
Filename
4625039
Link To Document