• DocumentCode
    1468542
  • Title

    Maximum Ambiguity-Based Sample Selection in Fuzzy Decision Tree Induction

  • Author

    Wang, Xi-Zhao ; Dong, Ling-Cai ; Yan, Jian-Hui

  • Author_Institution
    Dept. of Math. & Comput. Sci., Hebei Univ., Baoding, China
  • Volume
    24
  • Issue
    8
  • fYear
    2012
  • Firstpage
    1491
  • Lastpage
    1505
  • Abstract
    Sample selection is to select a number of representative samples from a large database such that a learning algorithm can have a reduced computational cost and an improved learning accuracy. This paper gives a new sample selection mechanism, i.e., the maximum ambiguity-based sample selection in fuzzy decision tree induction. Compared with the existing sample selection methods, this mechanism selects the samples based on the principle of maximal classification ambiguity. The major advantage of this mechanism is that the adjustment of the fuzzy decision tree is minimized when adding selected samples to the training set. This advantage is confirmed via the theoretical analysis of the leaf-nodes´ frequency in the decision trees. The decision tree generated from the selected samples usually has a better performance than that from the original database. Furthermore, experimental results show that generalization ability of the tree based on our selection mechanism is far more superior to that based on random selection mechanism.
  • Keywords
    data handling; database management systems; decision trees; fuzzy set theory; learning (artificial intelligence); computational cost; fuzzy decision tree induction; large database; leaf nodes frequency; learning accuracy; learning algorithm; maximal classification ambiguity; maximum ambiguity based sample selection; random selection mechanism; representative samples; sample selection mechanism; Decision trees; Entropy; Measurement uncertainty; Pragmatics; Probability distribution; Training; Uncertainty; Learning; fuzzy decision tree.; sample selection; uncertainty;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2011.67
  • Filename
    5728816