Title :
Minimum Spanning Tree Based Classification Model for Massive Data with MapReduce Implementation
Author :
Chang, Jin ; Luo, Jun ; Huang, Joshua Zhexue ; Feng, Shengzhong ; Fan, Jianping
Author_Institution :
Shenzhen Institutes of Adv. Technol., Chinese Acad. of Sci., Shenzhen, China
Abstract :
Rapid growth of data has provided us with more information, yet challenges the tradition techniques to extract the useful knowledge. In this paper, we propose MCMM, a Minimum spanning tree (MST) based Classification model for Massive data with MapReduce implementation. It can be viewed as an intermediate model between the traditional K nearest neighbor method and cluster based classification method, aiming to overcome their disadvantages and cope with large amount of data. Our model is implemented on Hadoop platform, using its MapReduce programming framework, which is particular suitable for cloud computing. We have done experiments on several data sets including real world data from UCI repository and synthetic data, using Downing 4000 clusters, installed with Hadoop. The results show that our model outperforms KNN and some other classification methods on a general basis with respect to accuracy and scalability.
Keywords :
cloud computing; data mining; MapReduce implementation; classification model; cloud computing; graph-based mining; massive data; minimum spanning tree; MapReduce; classification; cloud computing; graph-based mining; minimum spanning tree;
Conference_Titel :
Data Mining Workshops (ICDMW), 2010 IEEE International Conference on
Conference_Location :
Sydney, NSW
Print_ISBN :
978-1-4244-9244-2
Electronic_ISBN :
978-0-7695-4257-7
DOI :
10.1109/ICDMW.2010.14