• DocumentCode
    2494922
  • Title

    Efficient resampling methods for training support vector machines with imbalanced datasets

  • Author

    Batuwita, Rukshan ; Palade, Vasile

  • Author_Institution
    Comput. Lab., Oxford Univ., Oxford, UK
  • fYear
    2010
  • fDate
    18-23 July 2010
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    Random undersampling and oversampling are simple but well-known resampling methods applied to solve the problem of class imbalance. In this paper we show that the random oversampling method can produce better classification results than the random undersampling method, since the oversampling can increase the minority class recognition rate by sacrificing less amount of majority class recognition rate than the undersampling method. However, the random oversampling method would increase the computational cost associated with the SVM training largely due to the addition of new training examples. In this paper we present an investigation carried out to develop efficient resampling methods that can produce comparable classification results to the random oversampling results, but with the use of less amount of data. The main idea of the proposed methods is to first select the most informative data examples located closer to the class boundary region by using the separating hyperplane found by training an SVM model on the original imbalanced dataset, and then use only those examples in resampling. We demonstrate that it would be possible to obtain comparable classification results to the random oversampling results through two sets of efficient resampling methods which use 50% less amount of data and 75% less amount of data, respectively, compared to the sizes of the datasets generated by the random oversampling method.
  • Keywords
    data handling; sampling methods; support vector machines; SVM model; class imbalance problem; imbalanced datasets; random oversampling method; random undersampling method; resampling methods; support vector machines; Computational efficiency; Computational modeling; Digital signal processing; Machine learning; Support vector machines; Testing; Training;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks (IJCNN), The 2010 International Joint Conference on
  • Conference_Location
    Barcelona
  • ISSN
    1098-7576
  • Print_ISBN
    978-1-4244-6916-1
  • Type

    conf

  • DOI
    10.1109/IJCNN.2010.5596787
  • Filename
    5596787