Title :
A study on classifying imbalanced datasets
Author :
Lakshmi, T. Jaya ; Prasad, C. Siva Rama
Author_Institution :
Vasireddy Venkatadri Inst. of Technol., Guntur, India
Abstract :
Many problems in the real world are, in general modeled as binary classification problems and often one class samples outnumber other class samples. This imbalance causes the reduction in accuracy of prediction in minority class samples but give overall high accuracy. Ignoring misclassification rate of minority class causes severe problems in many cases such as fraudulent credit card transactions, medical diagnosis and e-mail foldering. Many classification algorithms existing in literature are designed for balanced datasets and these algorithms treat majority and minority class samples equal. In this study, the existing solutions for class imbalance problem and common evaluation techniques used for class imbalance are reviewed. The solutions were applied on three real world datasets. It is observed that a combination of SMOTE and Bagging with Random Forest produced the best overall accuracy of minority class.
Keywords :
learning (artificial intelligence); pattern classification; SMOTE; bagging-with-random forest; binary classification problems; class imbalance problem; e-mail foldering; fraudulent credit card transactions; imbalanced dataset classification; majority class samples; medical diagnosis; minority class samples; real world datasets; Accuracy; Algorithm design and analysis; Bagging; Electronic mail; Fault diagnosis; Prediction algorithms; Radio frequency;
Conference_Titel :
Networks & Soft Computing (ICNSC), 2014 First International Conference on
Conference_Location :
Guntur
Print_ISBN :
978-1-4799-3485-0
DOI :
10.1109/CNSC.2014.6906652