DocumentCode :
1666703
Title :
k-Means Performance Improvements with Centroid Calculation Heuristics Both for Serial and Parallel Environments
Author :
Karimov, Jeyhun ; Ozbayoglu, Murat ; Dogdu, Erdogan
Author_Institution :
Comput. Eng. Dept., TOBB Univ. of Econ. & Technol., Ankara, Turkey
fYear :
2015
Firstpage :
444
Lastpage :
451
Abstract :
K-means is the most widely used clustering algorithm due to its fairly straightforward implementations in various problems. Meanwhile, when the number of clusters increase, the number of iterations also tend to slightly increase. However there are still opportunities for improvement as some studies in the literature indicate. In this study, improved implementations of k-means algorithm with a centroid calculation heuristics which results in a performance improvement over traditional k-means are proposed. Two different versions of the algorithm for various data sizes are configured, one for small and the other one for big data implementations. Both the serial and MapReduce parallel implementations of the proposed algorithm are tested and analyzed using 2 different data sets with various number of clusters. The results show that big data implementation model outperforms the other compared methods after a certain threshold level and small data implementation performs better with increasing k value.
Keywords :
Big Data; parallel processing; pattern clustering; MapReduce; big data implementations; centroid calculation heuristics; clustering algorithm; k-means performance improvements; parallel environments; serial environments; Big data; Clustering algorithms; Complexity theory; Computational modeling; Computers; Data models; Standards; Big Data; Clustering; Hadoop; MapReduce; data mining; k-means; parallel algorithms; unsupervised learning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data (BigData Congress), 2015 IEEE International Congress on
Conference_Location :
New York, NY
Print_ISBN :
978-1-4673-7277-0
Type :
conf
DOI :
10.1109/BigDataCongress.2015.72
Filename :
7207256
Link To Document :
بازگشت