Title :
Decision Tree Algorithm based on Sampling
Author :
Xudong, Song ; Xiaolan, Cheng
Author_Institution :
Dalian Jiaotong Univ., Dalian
Abstract :
As the size of the database increases, data mining algorithm faces more demanding requirements for efficiency and accuracy. Data mining for large data sets require large amounts of time and physical resources. Sampling is introduced as an effective method. Facing large data sets, a new decision tree algorithm based on sampling is put forward. It can select small initial samples with similar distribution to the original data sets to study, and stop sampling according to the time complexity requirements and convergence criteria. Comparing with the existing flexible decision tree algorithm, the algorithm can reduce the computation time and I/O complexity, while maintaining the accuracy of the tree.
Keywords :
data mining; decision trees; convergence criteria; data mining algorithm; decision tree algorithm; time complexity; Classification tree analysis; Computer networks; Concurrent computing; Convergence; Data mining; Databases; Decision trees; Parallel processing; Partitioning algorithms; Sampling methods;
Conference_Titel :
Network and Parallel Computing Workshops, 2007. NPC Workshops. IFIP International Conference on
Conference_Location :
Liaoning
Print_ISBN :
978-0-7695-2943-1
DOI :
10.1109/NPC.2007.133