DocumentCode :
3756906
Title :
Investigating New Bootstrapping Approaches of Bagging Classifiers to Account for Class Imbalance in Bioinformatics Datasets
Author :
Alireza Fazelpour;Taghi M. Khoshgoftaar;David J. Dittman;Amri Napolitano
Author_Institution :
Florida Atlantic Univ., Boca Raton, FL, USA
fYear :
2015
Firstpage :
987
Lastpage :
994
Abstract :
One major challenge posed by bioinformatics datasets is class imbalance which occurs when one class has many more instances than the other class(es). Its undesirable effect on the classification performance is compounded with the fact that, in general, the class with fewer instances is the class of interest. Bagging has been utilized by practitioners in the field to overcome the challenge of class imbalance and to improve the classification performance. Our motivation for this study is to investigate whether changes to the bootstrapping step of bagging classifiers can further improve their performance. Specifically, these modifications to the bootstrapping process take into account the membership of the classes. We performed an extensive empirical study utilizing four bootstrap approaches within bagging framework using three feature rankers along with four feature subset sizes and two base classifiers across 15 imbalanced bioinformatics datasets. Three of these bootstrap approaches were proposed and implemented by our research team for this study. Our results show that all new approaches improve performance over the classic bootstrap approach, with balanced bagging having the highest performance, however, observed increases in performance are not statistically significant. We recommend the balanced bootstrap approach because it shows the most improvement, in terms of frequency of having the highest performance, and it generates fully balanced bootstrap datasets that can account for the class imbalance problem. The uniqueness of this paper is proposing and implementing the three innovative bootstrapping approaches to examine the effects of these bootstrapping processes against the classic one on the performance of bagging classifiers in the domain of bioinformatics.
Keywords :
"Bagging","Bioinformatics","Training","Biological system modeling","Learning systems","Decision trees","Algorithm design and analysis"
Publisher :
ieee
Conference_Titel :
Machine Learning and Applications (ICMLA), 2015 IEEE 14th International Conference on
Type :
conf
DOI :
10.1109/ICMLA.2015.42
Filename :
7424449
Link To Document :
بازگشت