• DocumentCode
    2211022
  • Title

    Local neighbourhood extension of SMOTE for mining imbalanced data

  • Author

    Maciejewski, Tomasz ; Stefanowski, Jerzy

  • Author_Institution
    Inst. of Comput. Sci., Poznan Univ. of Technol., Poznań, Poland
  • fYear
    2011
  • fDate
    11-15 April 2011
  • Firstpage
    104
  • Lastpage
    111
  • Abstract
    In this paper we discuss problems of inducing classifiers from imbalanced data and improving recognition of minority class using focused resampling techniques. We are particularly interested in SMOTE over-sampling method that generates new synthetic examples from the minority class between the closest neighbours from this class. However, SMOTE could also overgeneralize the minority class region as it does not consider distribution of other neighbours from the majority classes. Therefore, we introduce a new generalization of SMOTE, called LN-SMOTE, which exploits more precisely information about the local neighbourhood of the considered examples. In the experiments we compare this method with original SMOTE and its two, the most related, other generalizations Borderline and Safe-Level SMOTE. All these pre-processing methods are applied together with either decision tree or Naive Bayes classifiers. The results show that the new LN-SMOTE method improves evaluation measures for the minority class.
  • Keywords
    Bayes methods; data mining; pattern classification; SMOTE over-sampling method; focused resampling technique; imbalanced data mining; local neighbourhood extension; naive Bayes classifiers; Breast cancer; Data mining; Decision trees; Noise; Noise measurement; Sensitivity; Training;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence and Data Mining (CIDM), 2011 IEEE Symposium on
  • Conference_Location
    Paris
  • Print_ISBN
    978-1-4244-9926-7
  • Type

    conf

  • DOI
    10.1109/CIDM.2011.5949434
  • Filename
    5949434