DocumentCode
3127468
Title
Epidemic K-Means Clustering
Author
Fatta, Giuseppe Di ; Blasa, Francesco ; Cafiero, Simone ; Fortino, Giancarlo
Author_Institution
Sch. of Syst. Eng., Univ. of Reading, Reading, UK
fYear
2011
fDate
11-11 Dec. 2011
Firstpage
151
Lastpage
158
Abstract
The K-Means algorithm for cluster analysis is one of the most influential and popular data mining methods. Its straightforward parallel formulation is well suited for distributed memory systems with reliable interconnection networks. However, in large-scale geographically distributed systems the straightforward parallel algorithm can be rendered useless by a single communication failure or high latency in communication paths. This work proposes a fully decentralised algorithm (Epidemic K-Means) which does not require global communication and is intrinsically fault tolerant. The proposed distributed K-Means algorithm provides a clustering solution which can approximate the solution of an ideal centralised algorithm over the aggregated data as closely as desired. A comparative performance analysis is carried out against the state of the art distributed K-Means algorithms based on sampling methods. The experimental analysis confirms that the proposed algorithm is a practical and accurate distributed K-Means implementation for networked systems of very large and extreme scale.
Keywords
data mining; distributed memory systems; parallel algorithms; pattern clustering; cluster analysis; data mining methods; decentralised algorithm; distributed memory systems; epidemic k-means clustering; interconnection networks; networked systems; parallel algorithm; Algorithm design and analysis; Approximation algorithms; Approximation methods; Clustering algorithms; Peer to peer computing; Protocols; Vectors; Distributed clustering; K-Means; epidemic protocols; gossip-based aggregation; peer-to-peer data mining;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining Workshops (ICDMW), 2011 IEEE 11th International Conference on
Conference_Location
Vancouver, BC
Print_ISBN
978-1-4673-0005-6
Type
conf
DOI
10.1109/ICDMW.2011.76
Filename
6137374
Link To Document