DocumentCode
108931
Title
Ensembles of $({alpha})$-Trees for Imbalanced Classification Problems
Author
Yubin Park ; Ghosh, Joydeb
Author_Institution
Dept. of Electr. & Comput. Eng., Univ. of Texas at Austin, Austin, TX, USA
Volume
26
Issue
1
fYear
2014
fDate
Jan. 2014
Firstpage
131
Lastpage
143
Abstract
This paper introduces two kinds of decision tree ensembles for imbalanced classification problems, extensively utilizing properties of α-divergence. First, a novel splitting criterion based on α-divergence is shown to generalize several well-known splitting criteria such as those used in C4.5 and CART. When the α-divergence splitting criterion is applied to imbalanced data, one can obtain decision trees that tend to be less correlated (α-diversification) by varying the value of α. This increased diversity in an ensemble of such trees improves AUROC values across a range of minority class priors. The second ensemble uses the same alpha trees as base classifiers, but uses a lift-aware stopping criterion during tree growth. The resultant ensemble produces a set of interpretable rules that provide higher lift values for a given coverage, a property that is much desirable in applications such as direct marketing. Experimental results across many class-imbalanced data sets, including BRFSS, and MIMIC data sets from the medical community and several sets from UCI and KEEL are provided to highlight the effectiveness of the proposed ensembles over a wide range of data distributions and of class imbalance.
Keywords
decision trees; pattern classification; α-divergence splitting criterion; α-diversification; α-trees ensembles; AUROC values; BRFSS data set; C4.5; CART; KEEL; MIMIC data sets; UCI; alpha trees; base classifiers; class-imbalanced data sets; data distributions; decision tree ensembles; imbalanced classification problems; lift-aware stopping criterion; minority class prior range; Decision trees; Entropy; Equations; Impurities; Measurement; Training; Training data; Data mining; decision trees; ensemble classification; imbalanced data sets; lift;
fLanguage
English
Journal_Title
Knowledge and Data Engineering, IEEE Transactions on
Publisher
ieee
ISSN
1041-4347
Type
jour
DOI
10.1109/TKDE.2012.255
Filename
6399474
Link To Document