DocumentCode :
3683096
Title :
An Optimal Distributed K-Means Clustering Algorithm Based on CloudStack
Author :
Yingchi Mao; Ziyang Xu; Ping Ping; Longbao Wang
Author_Institution :
Coll. of Comput. &
fYear :
2015
Firstpage :
387
Lastpage :
392
Abstract :
Clustering algorithm is applied to many fields, especially in the data mining. Due to the increasing number of the data, it´s too hard for the clustering algorithm to afford the computation time in traditional computing model. When handling with big data, the corresponding algorithms of data mining have been transformed from the original single-core or single ported into the parallel and distributed processing. Parallel processing becomes the most popular way to improve the execution performance. This paper established a Hadoop distributed cluster based on the CloudStack and implemented the optimal distributed K-Means clustering algorithm based on MapReduce. The proposed optimal distributed K-Means clustering can obtain good quality of the clustering results and the efficiency of the execution time. The experiment results show that the optimal distributed K-Means cluster algorithm can have better performance for dealing with large-scale data set.
Keywords :
"Clustering algorithms","Algorithm design and analysis","Distributed databases","Computational modeling","Data mining","Complexity theory","Virtual machining"
Publisher :
ieee
Conference_Titel :
Frontier of Computer Science and Technology (FCST), 2015 Ninth International Conference on
Type :
conf
DOI :
10.1109/FCST.2015.71
Filename :
7314711
Link To Document :
بازگشت