DocumentCode :
2234543
Title :
The improvement and implementation of clustering algorithm based on multi-core computing
Author :
Dong, Liangyu ; Xu, Dongping ; Liu, Zhenzhen ; Wang, Shasha
Author_Institution :
College of computer science and technology, Wuhan University of Technology, China
fYear :
2015
fDate :
6-8 July 2015
Firstpage :
405
Lastpage :
411
Abstract :
Clustering is the process of grouping a set of physical or abstract objects into classes of similar objects. By appropriately representing the abstract objects in a vector space, the similarity among objects is equivalent to that among vectors. Hence, the problems, such as the clustering of limited data, clustering accuracy and efficiency, can be solved properly via calculating the similarity among vectors. As the research on clustering algorithm of limited data objects has been furthered and refined, it has been applied to various fields throughout commerce, industry, daily life, and national defense etc. When it comes to the pursue for higher efficiency of these applications, the amount of data will be expanded from limited to mass, accordingly the clustering of limited data will be massively enlarged. Thus, the implementation of the traditional serial programming algorithm, i.e. the goals of clustering will be encountered with a devastating challenge. The arising of Hadoop cloud computing platform throws light on the computing of mass data clustering. Nonetheless, under the new circumstances, the issues, like the efficiency and accuracy of clustering calculation, are still the focuses of information specialists. The essay proposes a K-means parallel clustering algorithm based on Hadoop platform and MapReduce programming model aiming at improving the traditional serial K-means clustering algorithm, which also improves the random selection of initial clustering center in K-means algorithm combined with Canopy algorithm. The experimental result shows that the improved algorithm reduces the time complexity. Moreover, the accuracy of the results and the execution efficiency have increased by 40% respectively.
Keywords :
Iris; Canopy; Clustering; Hadoop; K-means; MapReduce;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cognitive Informatics & Cognitive Computing (ICCI*CC), 2015 IEEE 14th International Conference on
Conference_Location :
Beijing, China
Print_ISBN :
978-1-4673-7289-3
Type :
conf
DOI :
10.1109/ICCI-CC.2015.7259417
Filename :
7259417
Link To Document :
بازگشت