Title :
Research on Parallel Data Stream Clustering Algorithm Based on Grid and Density
Author :
Weihua Hu;Mingzhong Cheng;Guoping Wu;Liang Wu
Author_Institution :
Sch. of Comput. Sci. &
Abstract :
With the emergence of big data and cloud computing, data stream arrives rapidly, large-scale and continuously, real-time data stream clustering analysis has become a hot topic in the study on the current data stream mining. Some existing data stream clustering algorithms cannot effectively deal with the high-dimensional data stream and are incompetent to find clusters of arbitrary shape in real-time, as well as the noise points could not be removed timely. To address these issues, this paper proposes PGDC-Stream, a algorithm based on grid and density for clustering data streams in a parallel distributed environment [4]. The algorithm adopts density threshold function to deal with the noise points and inspect and remove them periodically. It also can find clusters of arbitrary shape in large-scale data flow in real-time. The Map-Reduce framework is used for parallel cluster analysis of data streams.
Keywords :
"Clustering algorithms","Algorithm design and analysis","Analytical models","Real-time systems","Inspection","Programming"
Conference_Titel :
Computer Science and Mechanical Automation (CSMA), 2015 International Conference on
DOI :
10.1109/CSMA.2015.21