• DocumentCode
    2892313
  • Title

    Evaluation of SMOTE for High-Dimensional Class-Imbalanced Microarray Data

  • Author

    Blagus, R. ; Lusa, L.

  • Author_Institution
    Inst. for Biostat. & Med. Inf., Univ. of Ljubljana, Ljubljana, Slovenia
  • Volume
    2
  • fYear
    2012
  • fDate
    12-15 Dec. 2012
  • Firstpage
    89
  • Lastpage
    94
  • Abstract
    Synthetic Minority Oversampling TEchnique (SMOTE) is a popular oversampling method that was proposed to improve random oversampling but its behavior on high-dimensional data has not been thoroughly investigated. In this paper we evaluate the performance of SMOTE on high-dimensional data, using gene expression microarray data. We observe that SMOTE does not attenuate the bias towards the classification in the majority class for most classifiers, and it is less effective than random undersampling. SMOTE is beneficial for k-NN classifiers based on the Euclidean distance if the number of variables is reduced performing some type of variable selection and the benefit is larger if more neighbors are used. If the variable selection is not performed than the k-NN classification is counter intuitively biased towards the minority class, so SMOTE for k-NN without variable selection should not be used in practice.
  • Keywords
    biology computing; lab-on-a-chip; pattern classification; random processes; sampling methods; Euclidean distance-based k-NN classifiers; SMOTE evaluation; gene expression microarray data; high-dimensional class-imbalanced microarray data; high-dimensional data; minority class; random oversampling method; random undersampling; synthetic minority oversampling technique; Accuracy; Erbium; Gene expression; Input variables; Radio frequency; Support vector machines; Training; SMOTE; class-imbalance; high-dimensional;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Applications (ICMLA), 2012 11th International Conference on
  • Conference_Location
    Boca Raton, FL
  • Print_ISBN
    978-1-4673-4651-1
  • Type

    conf

  • DOI
    10.1109/ICMLA.2012.183
  • Filename
    6406733