Title :
CLUS: A new hybrid sampling classification for imbalanced data
Author :
Prachuabsupakij, Wanthanee
Author_Institution :
Dept. of Inf. Technol., King Mongkut´s Univ. of Technol. North Bangkok, Prachinburi, Thailand
Abstract :
The new hybrid sampling approach called CLUS- CLUSter-based hybrid sampling approach is proposed in this paper to improve the performance of classifier for two-class imbalanced datasets. The objective of this research is to develop algorithm that can effectively classify two-class imbalanced datasets, which have complicated distributions and large overlap between classes. These problems can make the learners failed in classification. Therefore, the contribution of CLUS is to alleviate the large overlap between classes and to balance the class distribution. Firstly, all instances are partitioned into k clusters using k-mean algorithms. Next, CLUS created the new subset, which consists of the instances from different classes, which have different characteristics. Secondly, for each subset, oversampling method is applied. Finally, SVMs is used to classify each training set based on majority vote. CLUS is tested using eight imbalanced benchmark datasets and assessed over two metrics; F-measure and AUC. The experimental results show that CLUS outperforms other methods especially when the number of imbalanced ratio is high.
Keywords :
pattern classification; pattern clustering; sampling methods; AUC; CLUS; F-measure; cluster-based hybrid sampling approach; hybrid sampling classification; imbalanced ratio; k-mean algorithms; majority vote; oversampling method; two-class imbalanced datasets; Classification algorithms; Clustering algorithms; Diabetes; Glass; Ionosphere; Support vector machines; Training; classification; clustering; data mining; imbalanced data; sampling;
Conference_Titel :
Computer Science and Software Engineering (JCSSE), 2015 12th International Joint Conference on
Conference_Location :
Songkhla
DOI :
10.1109/JCSSE.2015.7219810