DocumentCode :
249103
Title :
A study on classifying imbalanced datasets
Author :
Lakshmi, T. Jaya ; Prasad, C. Siva Rama
Author_Institution :
Vasireddy Venkatadri Inst. of Technol., Guntur, India
fYear :
2014
fDate :
19-20 Aug. 2014
Firstpage :
141
Lastpage :
145
Abstract :
Many problems in the real world are, in general modeled as binary classification problems and often one class samples outnumber other class samples. This imbalance causes the reduction in accuracy of prediction in minority class samples but give overall high accuracy. Ignoring misclassification rate of minority class causes severe problems in many cases such as fraudulent credit card transactions, medical diagnosis and e-mail foldering. Many classification algorithms existing in literature are designed for balanced datasets and these algorithms treat majority and minority class samples equal. In this study, the existing solutions for class imbalance problem and common evaluation techniques used for class imbalance are reviewed. The solutions were applied on three real world datasets. It is observed that a combination of SMOTE and Bagging with Random Forest produced the best overall accuracy of minority class.
Keywords :
learning (artificial intelligence); pattern classification; SMOTE; bagging-with-random forest; binary classification problems; class imbalance problem; e-mail foldering; fraudulent credit card transactions; imbalanced dataset classification; majority class samples; medical diagnosis; minority class samples; real world datasets; Accuracy; Algorithm design and analysis; Bagging; Electronic mail; Fault diagnosis; Prediction algorithms; Radio frequency;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Networks & Soft Computing (ICNSC), 2014 First International Conference on
Conference_Location :
Guntur
Print_ISBN :
978-1-4799-3485-0
Type :
conf
DOI :
10.1109/CNSC.2014.6906652
Filename :
6906652
Link To Document :
بازگشت