Title :
A Dynamic Sampling Framework for Multi-class Imbalanced Data
Author :
Debowski, B. ; Areibi, Shawki ; Grewal, Gary ; Tempelman, J.
Author_Institution :
Sch. of Eng., Univ. of Guelph, Guelph, ON, Canada
Abstract :
In this paper we present a Dynamic Sampling Framework for use with multi-class imbalanced data containing any number of classes. The framework makes use of existing sampling techniques such as RUS, ROS, and SMOTE and ties the classification algorithm into the sampling process in a wrapper like manner. In doing so the framework is able to search for a desirably sampled training set, thus eliminating the need to specify a target distribution and automatically tuning the training set distribution to the classification algorithm´s learning preferences. This is important when re-sampling multi-class data where manually searching for an appropriate target distribution would be a daunting task. We test both our Dynamic Sampling approach and traditional Static Sampling using RUS, ROS, SMOTE, ROS+RUS, and SMOTE+RUS with several classification algorithms on a four class, highly imbalanced data set. We compare the results of Static Sampling and Dynamic Sampling and find that overall both techniques are able to raise Recall for the highest minority classes, but Dynamic Sampling is also able to maintain or raise Recall for the majority classes. Also, Dynamic Sampling is overall more robust and resilient, and is better able to sustain classifier Accuracy and to raise G-Mean and Minimum F-Measures.
Keywords :
data mining; pattern classification; sampling methods; statistical distributions; G-Mean; ROS; RUS; SMOTE; classification algorithm; classification algorithm learning preferences; dynamic sampling; dynamic sampling framework; minimum F-measures; multiclass data re-sampling; multiclass imbalanced data; sampled training set; sampling process; sampling techniques; static sampling; target distribution; training set distribution; Accuracy; Algorithm design and analysis; Artificial neural networks; Educational institutions; Heuristic algorithms; Niobium; Training; Dynamic Sampling; Imbalanced Data; Multi-class;
Conference_Titel :
Machine Learning and Applications (ICMLA), 2012 11th International Conference on
Conference_Location :
Boca Raton, FL
Print_ISBN :
978-1-4673-4651-1
DOI :
10.1109/ICMLA.2012.144