Title :
Fast data sampling for large scale support vector machines
Author :
Rahul K. Sevakula;Mohammed Suhail;Nishchal K. Verma
Author_Institution :
Department of Electrical Engineering, Indian Institute of Technology Kanpur, India
Abstract :
Traditional algorithms for training the Support Vector Machines (SVMs) have a worst case time complexity of O(n3) and a space complexity of O(n2). This makes it difficult to scale the training algorithm for large scale datasets. In this paper, three algorithms have been proposed for reducing the training dataset. The algorithms mine the potential support vectors based on closeness to decision boundary information and use only them for learning the hyper-plane. The algorithms use spatial distribution descriptors such as median and quartiles to realize the closeness of data points to boundary. Initially a distance based algorithm is proposed for linear SVM, and later the same is extended for kernel SVM using projection vectors. The proposed data sampling algorithms have a time complexity of O(n). On experimentation, the algorithms are found to drastically reduce the number of training samples and accordingly reduce the training time of SVM, and in general, much compromise is not seen in classification accuracy.
Keywords :
"Training","Support vector machines","Time complexity","Kernel","Training data","Optimization"
Conference_Titel :
Computational Intelligence: Theories, Applications and Future Directions (WCI), 2015 IEEE Workshop on
DOI :
10.1109/WCI.2015.7495509