Title :
MReC4.5: C4.5 Ensemble Classification with MapReduce
Author :
Wu, Gongqing ; Li, Haiguang ; Hu, Xuegang ; Bi, Yuanjun ; Zhang, Jing ; Wu, Xindong
Author_Institution :
Sch. of Comput. Sci. & Inf. Eng., Hefei Univ. of Technol., Hefei, China
Abstract :
Classification is a significant technique in data mining research and applications. C4.5 is a widely used classification method, and ensemble learning adopts a parallel and distributed computing model for classification. Based on analyses of the MapReduce computing paradigm and the process of ensemble learning, we find that the parallel and distributed computing model in MapReduce is appropriate for implementing ensemble learning. This paper takes the advantages of C4.5, ensemble learning and the MapReduce computing model, and proposes a new method MReC4.5 for parallel and distributed ensemble classification. Our experimental results show that increasing the number of nodes would benefit the effectiveness of classification modeling, and serialization operations at the model level make the MReC4.5 classifier "construct once, use anywhere".
Keywords :
data mining; decision trees; grid computing; learning (artificial intelligence); C4.5 ensemble classification; MReC4.5; MapReduce; classification; data mining; distributed computing; ensemble learning; parallel computing; serialization operations; Classification algorithms; Cloud computing; Computer science; Concurrent computing; Data mining; Decision trees; Distributed computing; Parallel programming; Testing; Training data; Distributed computing; MapReduce; classification; data mining; ensemble learning;
Conference_Titel :
ChinaGrid Annual Conference, 2009. ChinaGrid '09. Fourth
Conference_Location :
Yantai, Shandong
Print_ISBN :
978-0-7695-3818-1
DOI :
10.1109/ChinaGrid.2009.39