Title :
Disturbing Neighbors Ensembles of Trees for Imbalanced Data
Author :
Rodriguez, Jeffrey J. ; Diez-Pastor, J.F. ; Maudes, J. ; Garcia-Osorio, C.
Author_Institution :
Dept. of Civil Eng., Unviersity of Burgos, Burgos, Spain
Abstract :
Disturbing Neighbors (DN) is a method for generating classifier ensembles. Moreover, it can be combined with any other ensemble method, generally improving the results. This paper considers the application of these ensembles to imbalanced data: classification problems where the class proportions are significantly different. DN ensembles are compared and combined with Bagging, using three tree methods as base classifiers: conventional decision trees (C4.5), Hellinger distance decision trees (HDDT) -- a method designed for imbalance data -- and model trees (M5P) -- trees with linear models at the leaves -- . The methods are compared using two collections of imbalanced datasets, with 20 and 66 datasets, respectively. The best results are obtained combining Bagging and DN, using conventional decision trees.
Keywords :
data handling; decision trees; pattern classification; Bagging; Hellinger distance decision trees; classification problems; classifier ensembles; conventional decision trees; disturbing neighbors; imbalanced data; linear models; model trees; tree methods; Accuracy; Bagging; Boosting; Data mining; Data models; Decision trees; Hellinger distance decision trees; bagging; classifier ensembles; decision trees; disturbing neighbors; imbalanced data; model trees;
Conference_Titel :
Machine Learning and Applications (ICMLA), 2012 11th International Conference on
Conference_Location :
Boca Raton, FL
Print_ISBN :
978-1-4673-4651-1
DOI :
10.1109/ICMLA.2012.181