DocumentCode
2291831
Title
SkewBoost: An algorithm for classifying imbalanced datasets
Author
Hukerikar, Saumil ; Tumma, Ashwin ; Nikam, Akshay ; Attar, Vahida
Author_Institution
Dept. of Comput. Eng. & Inf. Technol., Coll. of Eng. Pune, Pune, India
fYear
2011
fDate
15-17 Sept. 2011
Firstpage
46
Lastpage
52
Abstract
Many real world data sets have an imbalanced distribution of the instances. Learning from such data sets results in the classifier being biased towards the majority class, thereby tending to misclassify the minority class samples. In this paper, we provide a technique, SkewBoost which classifies the minority instances correctly without compromising much on the correct classification of the majority instances. In the SkewBoost technique, minority and majority instances are identified during execution of the boosting algorithm. A variation of SMOTE is used to create synthetic minority instances which are then added to the training set and total weight is rebalanced. After each iteration of the boosting algorithm, the weight of each instance is modified to focus more on the misclassified instances. A cost-sensitive approach has been adopted to reweight the instances following every iteration. This method is evaluated, in terms of the F-measure, G-mean, AUC, Recall and Precision, on imbalanced data sets against the results that have been published in the previous publications of algorithms on imbalanced datasets.
Keywords
data handling; iterative methods; learning (artificial intelligence); pattern classification; AUC; F-measure; G-mean; SMOTE variation; SkewBoost technique; boosting algorithm; cost-sensitive approach; imbalanced dataset classification algorithm; precision; recall; synthetic minority instances; Accuracy; Boosting; Classification algorithms; Data mining; Measurement; boosting; instance weights; minority class; over sampling;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer and Communication Technology (ICCCT), 2011 2nd International Conference on
Conference_Location
Allahabad
Print_ISBN
978-1-4577-1385-9
Type
conf
DOI
10.1109/ICCCT.2011.6075185
Filename
6075185
Link To Document