• DocumentCode
    559649
  • Title

    Optimized very fast decision tree with balanced classification accuracy and compact tree size

  • Author

    Yang, Hang ; Fong, Simon

  • Author_Institution
    Fac. of Sci. & Technol., Univ. of Macau, Macau, China
  • fYear
    2011
  • fDate
    24-26 Oct. 2011
  • Firstpage
    57
  • Lastpage
    64
  • Abstract
    Very Fast Decision Tree (VFDT) in data stream mining has been widely studied for more than a decade. VFDT in essence can mine over a portion of an unbounded data stream at a time, and the structure of the decision tree gets updated whenever new data feed in; hence it can predict better upon the input of fresh data. Inherent from traditional decision trees that use information gains for tree induction, VFDT may suffer the same over-fitting problem where noise in the training data leads to excessive tree branches therefore decline in prediction accuracy. This problem is aggravated in stream mining because limited run-time memory (for storing the whole decision tree) and reasonable accuracy are often the criteria for implementing VFDT. Post-pruning that was a popular technique used in traditional decision tree to keep the tree size in check, however may not be applicable for VFDT in situ. In this paper a new model that extends from VFDT called Optimized VFDT is proposed, for controlling the tree size while sustaining good prediction accuracy. This is enabled by using an adaptive threshold tie and incremental pruning in tree induction. Experimental results show that an optimal ratio of tree size and accuracy can be achieved by OVFDT.
  • Keywords
    data mining; decision trees; pattern classification; classification accuracy; data stream mining; incremental pruning; information gain; optimized VFDT model; post-pruning technique; tree induction; very fast decision tree; Accuracy; Data mining; Decision trees; Estimation; Measurement; Noise; Optimization; Incremental Optimization; OVFDT; Stream Mining; Tree Pruning; Very Fast Decision Tree;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining and Intelligent Information Technology Applications (ICMiA), 2011 3rd International Conference on
  • Conference_Location
    Macao
  • Print_ISBN
    978-1-4673-0231-9
  • Type

    conf

  • Filename
    6108399