DocumentCode :
2027461
Title :
Gossip-based spectral clustering of distributed data streams
Author :
Talistu, Matt ; Teng-Sheng Moh ; Moh, Melody
Author_Institution :
Dept. of Comput. Sci., San Jose State Univ., San Jose, CA, USA
fYear :
2015
fDate :
20-24 July 2015
Firstpage :
325
Lastpage :
333
Abstract :
With the growth of the Internet, social networks, and other distributed systems, there is an abundance of data about user transactions, network traffic, social interactions, and other areas that is available for analysis. Extracting knowledge from this data has become a growing field of research recently, especially as the size of the data makes traditional data mining methods ineffective. Some approaches assume the data is at a central location or a complete set of data is available for analysis. However, many modern-day applications consume distributed data streams. The dataset is spread across multiple locations and each location only has access to a portion of the data stream. We propose a distributed data stream analysis method, which uses hierarchical clustering for local online summary, a gossip protocol for distributing these summaries, and spectral clustering for offline analysis. The resulting solution successfully avoids the heavy computation and communication capability requirements of a centralized approach. Through experiments, we have demonstrated that the proposed solution is able to accurately cluster the data streams and is highly scalable. Its quality significantly increases as the number of microcluster increases, yet it is fault-tolerant when this number is small. Finally, it has achieved a similar level of accuracy when compared with a centralized approach.
Keywords :
pattern clustering; peer-to-peer computing; central location; centralized approach; data analysis; data mining methods; data size; data stream clustering; distributed data stream analysis method; gossip protocol; gossip-based spectral clustering; hierarchical clustering; knowledge extraction; local online summary; microcluster; offline analysis; spectral clustering; Algorithm design and analysis; Clustering algorithms; Distributed databases; Fault tolerance; Fault tolerant systems; Peer-to-peer computing; Protocols; cluster evolution; distributed data stream analysis; gossip protocol; spectral clustering;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computing & Simulation (HPCS), 2015 International Conference on
Conference_Location :
Amsterdam
Print_ISBN :
978-1-4673-7812-3
Type :
conf
DOI :
10.1109/HPCSim.2015.7237058
Filename :
7237058
Link To Document :
بازگشت