DocumentCode
654086
Title
A parallel algorithm to induce decision trees for large datasets
Author
Franco-Arcega, A. ; Suarez-Cansino, J. ; Flores-Flores, L.G.
Author_Institution
Inf. & Syst. Technol. Res. Center, Autonomous Univ. of the State of Hidalgo, Hidalgo, Mexico
fYear
2013
fDate
Oct. 30 2013-Nov. 1 2013
Firstpage
1
Lastpage
6
Abstract
This paper introduces a new parallel algorithm called ParDTLT and discusses some of its advantages with respect to a set of well known sequential and parallel algorithms. The parallel process occurs in every node in the decision tree, which is constructed during the supervised training phase. The basis of the distribution of a parallel task is on the attributes of the training objects and the growing of the tree is based on two criteria, who are defined by the maximum number of training objects that every node can support and an entropic gain ratio criterion. Different experiments have been made to compare the behavior of the parallel algorithm ParDTLT with the behavior of the sequential algorithms C4.5, VFDT, YaDT and DTLT and with the parallel algorithm called Synchronous. The experimental results show that ParDTLT keeps the quality of classification and it reduces the execution time.
Keywords
database management systems; decision trees; entropy; parallel algorithms; C4.5 algorithms; DTLT algorithms; ParDTLT; Synchronous algorithm; VFDT algorithms; YaDT algorithms; decision trees; entropic gain ratio criterion; execution time; large datasets; parallel algorithm; parallel process; parallel task distribution; sequential algorithms; supervised training phase; Algorithm design and analysis; Decision trees; Parallel algorithms; Program processors; Time complexity; Training;
fLanguage
English
Publisher
ieee
Conference_Titel
Information, Communication and Automation Technologies (ICAT), 2013 XXIV International Symposium on
Conference_Location
Sarajevo
Type
conf
DOI
10.1109/ICAT.2013.6684045
Filename
6684045
Link To Document