Title :
A distributed instance-weighted SVM algorithm on large-scale imbalanced datasets
Author :
Xiaoguang Wang ; Xuan Liu ; Matwin, S.
Author_Institution :
Fac. of Comput. Sci., Dalhousie Univ., Halifax, NS, Canada
Abstract :
When huge amounts of data are processed to extract knowledge, the situation becomes a challenge because the data mining techniques are not adapted to the space and time requirements. This challenge is more significant when the data is class imbalanced. Like many other machine learning algorithms, the success of the support vector machine (SVM) is limited when it is applied to the problem of learning from imbalanced datasets, especially on big datasets. In this paper, we are trying to apply an instance-weighted variant of the SVM, with a parallel Meta-learning algorithm using MapReduce, to deal with the big data class imbalance problem. We develop a symmetric weight boosting method to optimize the instance-weighted SVM. Experimental results on benchmark datasets and real application big datasets show that the proposed algorithm not only is effective on big data class imbalanced problem, but also reduces the training computational complexity significantly when the number of computing nodes increases.
Keywords :
Big Data; computational complexity; data mining; learning (artificial intelligence); parallel algorithms; support vector machines; Big Data class imbalance problem; MapReduce framework; benchmark datasets; big datasets; class imbalanced data; computing nodes; data mining techniques; data processing; distributed instance-weighted SVM algorithm; instance-weighted SVM optimization; knowledge extraction; large-scale imbalanced datasets; parallel meta-learning algorithm; support vector machine; symmetric weight boosting method; training computational complexity reduction; Algorithm design and analysis; Big data; Classification algorithms; Optimization; Support vector machines; Training; Training data; Class imbalance problem; Hadoop; MapReduce; SVMs;
Conference_Titel :
Big Data (Big Data), 2014 IEEE International Conference on
Conference_Location :
Washington, DC
DOI :
10.1109/BigData.2014.7004467