• DocumentCode
    108931
  • Title

    Ensembles of $({alpha})$-Trees for Imbalanced Classification Problems

  • Author

    Yubin Park ; Ghosh, Joydeb

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of Texas at Austin, Austin, TX, USA
  • Volume
    26
  • Issue
    1
  • fYear
    2014
  • fDate
    Jan. 2014
  • Firstpage
    131
  • Lastpage
    143
  • Abstract
    This paper introduces two kinds of decision tree ensembles for imbalanced classification problems, extensively utilizing properties of α-divergence. First, a novel splitting criterion based on α-divergence is shown to generalize several well-known splitting criteria such as those used in C4.5 and CART. When the α-divergence splitting criterion is applied to imbalanced data, one can obtain decision trees that tend to be less correlated (α-diversification) by varying the value of α. This increased diversity in an ensemble of such trees improves AUROC values across a range of minority class priors. The second ensemble uses the same alpha trees as base classifiers, but uses a lift-aware stopping criterion during tree growth. The resultant ensemble produces a set of interpretable rules that provide higher lift values for a given coverage, a property that is much desirable in applications such as direct marketing. Experimental results across many class-imbalanced data sets, including BRFSS, and MIMIC data sets from the medical community and several sets from UCI and KEEL are provided to highlight the effectiveness of the proposed ensembles over a wide range of data distributions and of class imbalance.
  • Keywords
    decision trees; pattern classification; α-divergence splitting criterion; α-diversification; α-trees ensembles; AUROC values; BRFSS data set; C4.5; CART; KEEL; MIMIC data sets; UCI; alpha trees; base classifiers; class-imbalanced data sets; data distributions; decision tree ensembles; imbalanced classification problems; lift-aware stopping criterion; minority class prior range; Decision trees; Entropy; Equations; Impurities; Measurement; Training; Training data; Data mining; decision trees; ensemble classification; imbalanced data sets; lift;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2012.255
  • Filename
    6399474