DocumentCode :
2130662
Title :
Chi-Square Test Based Decision Trees Induction in Distributed Environment
Author :
Ouyang, Jie ; Patel, Nilesh ; Sethi, Ishwar K.
Author_Institution :
Dept. of Comput. Sci. & Eng., Oakland Univ., Rochester, MI
fYear :
2008
fDate :
15-19 Dec. 2008
Firstpage :
477
Lastpage :
485
Abstract :
The decision tree-based classification is a popular approach for pattern recognition and data mining. Most decision tree induction methods assume training data being present at one central location. Given the growth in distributed databases at geographically dispersed locations, the methods for decision tree induction in distributed settings are gaining importance. This paper describes one distributed learning algorithm which extends the original(centralized) CHAID algorithm to its distributed version. This distributed algorithm generates exactly the same results as its centralized counterpart. For completeness, a distributed quantization method is proposed so that continuous data can be processed by our algorithm. Experimental results for several well known data sets are presented and compared with decision trees generated using CHAID with centrally stored data.
Keywords :
data mining; decision trees; distributed databases; pattern classification; CHAID algorithm; Chi-square test; classification; data mining; decision trees induction; distributed databases; distributed environment; geographically dispersed locations; pattern recognition; Classification tree analysis; Data mining; Decision trees; Distributed algorithms; Distributed databases; Induction generators; Pattern recognition; Quantization; Testing; Training data; Chi square test; Distributed decision tree;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining Workshops, 2008. ICDMW '08. IEEE International Conference on
Conference_Location :
Pisa
Print_ISBN :
978-0-7695-3503-6
Electronic_ISBN :
978-0-7695-3503-6
Type :
conf
DOI :
10.1109/ICDMW.2008.37
Filename :
4733971
Link To Document :
بازگشت