DocumentCode :
678431
Title :
Distributed K-Means Clustering with Low Transmission Cost
Author :
Coelho Naldi, Murilo ; Gabrielli Barreto Campello, Ricardo Jose
Author_Institution :
Dept. of Exact & Technol. Sci., Fed. Univ. of Vicosa-UFV, Paranaiba, Brazil
fYear :
2013
fDate :
19-24 Oct. 2013
Firstpage :
70
Lastpage :
75
Abstract :
Dealing with big amounts of data is one of the challenges for clustering, which causes the need for distribution of large data sets in separate repositories. However, most clustering techniques require the data to be centralized. One of them, the k-means, has been elected one of the most influential data mining algorithms. Although exact distributed versions of the k-means algorithm have been proposed, the algorithm is still sensitive to the selection of the initial cluster prototypes and requires that the number of clusters be specified in advance. Additionally, distributed versions of clustering algorithms usually requires multiple rounds of data transmission. This work tackles the problem of generating an approximated model for distributed clustering, based on k-means, for scenarios where the number of clusters of the distributed data is unknown and the data transmission rate is low or costly. A collection of algorithms is proposed to combine k-means clustering for each distributed subset of the data with a single round of communication. These algorithms are compared from two perspectives: the theoretical one, through asymptotic complexity analyses, and the experimental one, through a comparative evaluation of results obtained from experiments and statistical tests.
Keywords :
approximation theory; computational complexity; data mining; pattern clustering; statistical testing; approximation model; asymptotic complexity analysis; cluster prototype selection; clustering algorithms; clustering techniques; data clustering; data mining algorithms; data transmission rate; distributed data; distributed k-means clustering; statistical tests; transmission cost; Approximation algorithms; Clustering algorithms; Data communication; Distributed databases; Partitioning algorithms; Sociology; Statistics; clustering; distributed data sets; k-means; low data transfer;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Systems (BRACIS), 2013 Brazilian Conference on
Conference_Location :
Fortaleza
Type :
conf
DOI :
10.1109/BRACIS.2013.20
Filename :
6726428
Link To Document :
بازگشت