Title :
Improving classification performance for the minority class in highly imbalanced dataset using boosting
Author :
Abouelenien, Mohamed ; Xiaohui Yuan ; Duraisamy, Prakash ; Xiaojing Yuan
Author_Institution :
Dept. of Comput. Sci. & Eng., Univ. of North Texas, Denton, TX, USA
Abstract :
Data imbalance is a common property in many medical and biological data and usually results in degraded generalization performance. In this article, we present a novel boosting method to address two important questions in learning from imbalanced dataset: how to maximize the performance of classifying the minority instances without compromising the performance for the majority instances? and how to select training instances to achieve a comprehensive representation of the data distribution and avoid high computational time? Our method maximizes the usage of the available samples with priority given to the minority samples. The base classifiers are weighted with their sensitivities derived from the training examples. Using synthetic and real-world datasets, we demonstrated the performance improvement of our method in both sensitivity and accuracy without major reduction in specificity. In contrast to AdaBoost, our method took much less time, which makes it applicable in real-world problems that have large amount of data.
Keywords :
learning (artificial intelligence); pattern classification; AdaBoost; base classifier sensitivity; biological data; boosting method; classification performance; data distribution; highly imbalanced dataset; majority instances; medical data; minority class; minority instances; performance improvement; real-world datasets; synthetic datasets; training instance selection; Accuracy; Boosting; Image segmentation; Sensitivity; Silicon; Support vector machines; Training; Boosting; Classification;
Conference_Titel :
Computing Communication & Networking Technologies (ICCCNT), 2012 Third International Conference on
Conference_Location :
Coimbatore
DOI :
10.1109/ICCCNT.2012.6477850