DocumentCode :
3585930
Title :
A parallel sampling-PSO-multi-core-K-means algorithm using mapreduce
Author :
Bousbaci, Abdelhak ; Kamel, Nadjet
Author_Institution :
Comput. Sci. Dept., USTHB, Algiers, Algeria
fYear :
2014
Firstpage :
129
Lastpage :
134
Abstract :
Clustering is partitioning data into groups, such that data in the same group are similar. Many clustering algorithms are proposed in the literature. K-means is the most used one because of its implementation simplicity and efficiency. Many clustering algorithms are based on the K-means algorithms aiming to improve execution time or clustering quality or both of them. Improving clustering quality can be done by an optimal selection of the initial centroids using for example meta-heuristics. Improving execution time can be performed using parallelism. In this paper, we propose a parallel hybrid K-means based on Google´s MapReduce framework for the parallelism and the PSO meta-heuristics for the choice of the initial centroids. This algorithm is used to cluster multi-dimensional data sets. The results proved that using a network of machines to process data improves the execution time and the clustering quality.
Keywords :
data handling; multiprocessing systems; parallel algorithms; particle swarm optimisation; pattern clustering; Google MapReduce framework; PSO metaheuristics; clustering algorithm; clustering quality; multidimensional data set; optimal selection; parallel hybrid k-means; parallel sampling-PSO-multicore-k-means algorithm; partitioning data; Algorithm design and analysis; Clustering algorithms; Heuristic algorithms; Instruction sets; Message systems; Parallel processing; Partitioning algorithms; K-means; MapReduce; PSO; Sampling; Shared memory;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Hybrid Intelligent Systems (HIS), 2014 14th International Conference on
Print_ISBN :
978-1-4799-7632-4
Type :
conf
DOI :
10.1109/HIS.2014.7086185
Filename :
7086185
Link To Document :
بازگشت