Title :
K-means clustering based SVM ensemble methods for imbalanced data problem
Author :
Jaedong Lee ; Jee-Hyong Lee
Author_Institution :
Dept. of Electr. & Comput. Eng., Sungkyunkwan Univ., Suwon, South Korea
Abstract :
When the number of data in one class is significantly larger or less than the data in other class, under machine learning algorithm for classification, a problem of learning generalization occurs to the specific class and this is called imbalanced data problem. In this paper, we propose a novel method to solve the imbalanced data problem. We first divide data into clusters using K-means clustering algorithm and create classifier using the Support Vector Machine (SVM) method on each cluster. Before making classifier for each cluster, we are balancing the data for each cluster using data sampling techniques. After all classifiers are made for each cluster, we validate each classifier´s performance using validation data. Final classification result would be calculated using the test data by aggregating all the cluster´s classification results. We are using not only the results from the classifiers in each clusters, but also the credit of each classifier and data membership to each cluster. We have verified that the proposed classification method shows better performance than the existing machine learning algorithms for imbalanced data classification problem.
Keywords :
learning (artificial intelligence); pattern classification; pattern clustering; support vector machines; SVM ensemble method; cluster classification; imbalanced data classification problem; k-means clustering; machine learning algorithm; support vector machine; Clustering algorithms; Machine learning algorithms; Pattern recognition; Rain; Support vector machines; Training data; SVM ensemble method; data membership; imbalanced data; k-means clustering;
Conference_Titel :
Soft Computing and Intelligent Systems (SCIS), 2014 Joint 7th International Conference on and Advanced Intelligent Systems (ISIS), 15th International Symposium on
DOI :
10.1109/SCIS-ISIS.2014.7044861