Title :
Optimized very fast decision tree with balanced classification accuracy and compact tree size
Author :
Yang, Hang ; Fong, Simon
Author_Institution :
Fac. of Sci. & Technol., Univ. of Macau, Macau, China
Abstract :
Very Fast Decision Tree (VFDT) in data stream mining has been widely studied for more than a decade. VFDT in essence can mine over a portion of an unbounded data stream at a time, and the structure of the decision tree gets updated whenever new data feed in; hence it can predict better upon the input of fresh data. Inherent from traditional decision trees that use information gains for tree induction, VFDT may suffer the same over-fitting problem where noise in the training data leads to excessive tree branches therefore decline in prediction accuracy. This problem is aggravated in stream mining because limited run-time memory (for storing the whole decision tree) and reasonable accuracy are often the criteria for implementing VFDT. Post-pruning that was a popular technique used in traditional decision tree to keep the tree size in check, however may not be applicable for VFDT in situ. In this paper a new model that extends from VFDT called Optimized VFDT is proposed, for controlling the tree size while sustaining good prediction accuracy. This is enabled by using an adaptive threshold tie and incremental pruning in tree induction. Experimental results show that an optimal ratio of tree size and accuracy can be achieved by OVFDT.
Keywords :
data mining; decision trees; pattern classification; classification accuracy; data stream mining; incremental pruning; information gain; optimized VFDT model; post-pruning technique; tree induction; very fast decision tree; Accuracy; Data mining; Decision trees; Estimation; Measurement; Noise; Optimization; Incremental Optimization; OVFDT; Stream Mining; Tree Pruning; Very Fast Decision Tree;
Conference_Titel :
Data Mining and Intelligent Information Technology Applications (ICMiA), 2011 3rd International Conference on
Conference_Location :
Macao
Print_ISBN :
978-1-4673-0231-9