DocumentCode
1906842
Title
Bootstrap Sampling Based Data Cleaning and Maximum Entropy SVMs for Large Datasets
Author
Senzhang Wang ; Zhoujun Li ; Xiaoming Zhang
Author_Institution
State Key Lab. of Software Dev. Environ., Beihang Univ., Beijing, China
Volume
1
fYear
2012
fDate
7-9 Nov. 2012
Firstpage
1151
Lastpage
1156
Abstract
Support Vector Machines (SVMs) is a popular machine learning algorithm based on Statistical Learning Theory (SLT). However, traditional solutions suffer from O(n2) time complexity. In this paper, a novel two-stage informative pattern abstraction algorithm is proposed. The first stage of the algorithm is data cleaning based on bootstrap sampling. A bundle of weak SVM classifiers are trained based on the sampled small datasets. Training data correctly classified by all the weak classifiers are cleaned. In the second stage, to further improve performance of final classifier and reduce training time, two novel informative pattern extraction algorithms based on entropy maximization SVMs are proposed. Empirical studies show our approach is effective in reducing size of training datasets and the computational cost, outperforming the state-of-the-art SVM training algorithms PEGASOS, RSVM and LIBLINEAR SVM with comparable classification accuracy.
Keywords
computational complexity; data handling; learning (artificial intelligence); pattern classification; sampling methods; support vector machines; LIBLINEAR SVM; O(n2) time complexity; PEGASOS; RSVM; SLT; SVM classifiers; SVM training algorithms; bootstrap sampling based data cleaning; classification accuracy; entropy maximization SVM; informative pattern extraction algorithms; large datasets; machine learning algorithm; maximum entropy SVM; statistical learning theory; support vector machines; two-stage informative pattern abstraction algorithm; Accuracy; Cleaning; Data mining; Information entropy; Support vector machines; Training; Training data; SVMs; bootstrap sampling; entropy maximization;
fLanguage
English
Publisher
ieee
Conference_Titel
Tools with Artificial Intelligence (ICTAI), 2012 IEEE 24th International Conference on
Conference_Location
Athens
ISSN
1082-3409
Print_ISBN
978-1-4799-0227-9
Type
conf
DOI
10.1109/ICTAI.2012.164
Filename
6495181
Link To Document