DocumentCode
2350412
Title
MReC4.5: C4.5 Ensemble Classification with MapReduce
Author
Wu, Gongqing ; Li, Haiguang ; Hu, Xuegang ; Bi, Yuanjun ; Zhang, Jing ; Wu, Xindong
Author_Institution
Sch. of Comput. Sci. & Inf. Eng., Hefei Univ. of Technol., Hefei, China
fYear
2009
fDate
21-22 Aug. 2009
Firstpage
249
Lastpage
255
Abstract
Classification is a significant technique in data mining research and applications. C4.5 is a widely used classification method, and ensemble learning adopts a parallel and distributed computing model for classification. Based on analyses of the MapReduce computing paradigm and the process of ensemble learning, we find that the parallel and distributed computing model in MapReduce is appropriate for implementing ensemble learning. This paper takes the advantages of C4.5, ensemble learning and the MapReduce computing model, and proposes a new method MReC4.5 for parallel and distributed ensemble classification. Our experimental results show that increasing the number of nodes would benefit the effectiveness of classification modeling, and serialization operations at the model level make the MReC4.5 classifier "construct once, use anywhere".
Keywords
data mining; decision trees; grid computing; learning (artificial intelligence); C4.5 ensemble classification; MReC4.5; MapReduce; classification; data mining; distributed computing; ensemble learning; parallel computing; serialization operations; Classification algorithms; Cloud computing; Computer science; Concurrent computing; Data mining; Decision trees; Distributed computing; Parallel programming; Testing; Training data; Distributed computing; MapReduce; classification; data mining; ensemble learning;
fLanguage
English
Publisher
ieee
Conference_Titel
ChinaGrid Annual Conference, 2009. ChinaGrid '09. Fourth
Conference_Location
Yantai, Shandong
Print_ISBN
978-0-7695-3818-1
Type
conf
DOI
10.1109/ChinaGrid.2009.39
Filename
5329047
Link To Document