Title :
Parallelization of decision tree algorithm and its performance evaluation
Author :
Kubota, Kazuto ; Nakase, Akihiko ; Sakai, Hiroshi ; Oyanagi, Shigeru
Author_Institution :
Real World Comput. Partnership, Kanagawa, Japan
Abstract :
Data mining is a typical application of high performance computing in the business field. An efficient data mining system which can deal with huge amount of data is desired. This paper describes the parallel processing of decision tree which is a typical algorithm for classification of large database. A free software C4.5 is parallelized for SMP machine using thread library. Parallelism in generating a decision tree can be classified into intra-node parallelism and inter-node parallelism. Intra-node parallelism can be further classified into record parallelism, attribute parallelism, and their combination. We have implemented these four kinds of parallelizing methods, and evaluated their effects with four kinds of test data. The result shows that there is a relation between the characteristics of data and the parallelizing methods, and combination of multiple parallelizing methods is the most effective one.
Keywords :
data mining; decision trees; parallel processing; performance evaluation; public domain software; SMP machine; attribute parallelism; data mining; decision tree algorithm; free software C4.5; high performance computing; inter-node parallelism; intra-node parallelism; large database; multiple parallelizing methods; parallel processing; parallelization; performance evaluation;
Conference_Titel :
High Performance Computing in the Asia-Pacific Region, 2000. Proceedings. The Fourth International Conference/Exhibition on
Conference_Location :
Beijing, China
Print_ISBN :
0-7695-0589-2
DOI :
10.1109/HPC.2000.843500