• DocumentCode
    1906842
  • Title

    Bootstrap Sampling Based Data Cleaning and Maximum Entropy SVMs for Large Datasets

  • Author

    Senzhang Wang ; Zhoujun Li ; Xiaoming Zhang

  • Author_Institution
    State Key Lab. of Software Dev. Environ., Beihang Univ., Beijing, China
  • Volume
    1
  • fYear
    2012
  • fDate
    7-9 Nov. 2012
  • Firstpage
    1151
  • Lastpage
    1156
  • Abstract
    Support Vector Machines (SVMs) is a popular machine learning algorithm based on Statistical Learning Theory (SLT). However, traditional solutions suffer from O(n2) time complexity. In this paper, a novel two-stage informative pattern abstraction algorithm is proposed. The first stage of the algorithm is data cleaning based on bootstrap sampling. A bundle of weak SVM classifiers are trained based on the sampled small datasets. Training data correctly classified by all the weak classifiers are cleaned. In the second stage, to further improve performance of final classifier and reduce training time, two novel informative pattern extraction algorithms based on entropy maximization SVMs are proposed. Empirical studies show our approach is effective in reducing size of training datasets and the computational cost, outperforming the state-of-the-art SVM training algorithms PEGASOS, RSVM and LIBLINEAR SVM with comparable classification accuracy.
  • Keywords
    computational complexity; data handling; learning (artificial intelligence); pattern classification; sampling methods; support vector machines; LIBLINEAR SVM; O(n2) time complexity; PEGASOS; RSVM; SLT; SVM classifiers; SVM training algorithms; bootstrap sampling based data cleaning; classification accuracy; entropy maximization SVM; informative pattern extraction algorithms; large datasets; machine learning algorithm; maximum entropy SVM; statistical learning theory; support vector machines; two-stage informative pattern abstraction algorithm; Accuracy; Cleaning; Data mining; Information entropy; Support vector machines; Training; Training data; SVMs; bootstrap sampling; entropy maximization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Tools with Artificial Intelligence (ICTAI), 2012 IEEE 24th International Conference on
  • Conference_Location
    Athens
  • ISSN
    1082-3409
  • Print_ISBN
    978-1-4799-0227-9
  • Type

    conf

  • DOI
    10.1109/ICTAI.2012.164
  • Filename
    6495181