Title :
A quasi-linear SVM combined with assembled SMOTE for imbalanced data classification
Author :
Bo Zhou ; Cheng Yang ; Haixiang Guo ; Jinglu Hu
Author_Institution :
Grad. Sch. of Inf., Production & Syst., Waseda Univ. of Hibikino, Kitakyushu, Japan
Abstract :
This paper focuses on imbalanced dataset classification problem by using SVM and oversampling method. Traditional oversampling method increases the occurrence of over-lapping between classes, which leads to poor generalization of SVM classification. To solve this problem this paper proposes a combined method of quasi-linear SVM and assembled SMOTE. The quasi-linear SVM is an SVM with quasi-linear kernel function. It realizes an approximate nonlinear separation boundary by mulit-local linear boundaries with interpolation. The assembled SMOTE implements oversampling with considering of the data distribution information and avoids occurrence of overlapping between classes. Firstly, a partition method based on Minimal Spanning Tree is proposed to obtain local linear partitions, each of which can be separated with one linear separation boundary. Secondly, using the information of local linear partitions, the assembled SMOTE generates synthetic minority class samples. Finally, the quasi-linear SVM realizes a classification of oversampled datasets in the same way as a standard SVM by using a composite quasi-linear kernel function. Experiment results on artificial data and benchmark datasets show that the proposed method is effective and improves classification performances.
Keywords :
approximation theory; interpolation; pattern classification; sampling methods; support vector machines; trees (mathematics); approximate nonlinear separation boundary; artificial data datasets; assembled SMOTE; benchmark datasets; classification performance improvement; composite quasilinear kernel function; data distribution information; imbalanced dataset classification problem; interpolation; linear separation boundary; local linear partitioning method; minimal spanning tree; mulitlocal linear boundaries; oversampled dataset classification; oversampling method; quasilinear SVM; quasilinear kernel function; standard SVM; synthetic minority class samples; synthetic minority over-sampling technique; Interpolation; Kernel; Merging; Sociology; Standards; Statistics; Support vector machines;
Conference_Titel :
Neural Networks (IJCNN), The 2013 International Joint Conference on
Conference_Location :
Dallas, TX
Print_ISBN :
978-1-4673-6128-6
DOI :
10.1109/IJCNN.2013.6707035