Title :
Local neighbourhood extension of SMOTE for mining imbalanced data
Author :
Maciejewski, Tomasz ; Stefanowski, Jerzy
Author_Institution :
Inst. of Comput. Sci., Poznan Univ. of Technol., Poznań, Poland
Abstract :
In this paper we discuss problems of inducing classifiers from imbalanced data and improving recognition of minority class using focused resampling techniques. We are particularly interested in SMOTE over-sampling method that generates new synthetic examples from the minority class between the closest neighbours from this class. However, SMOTE could also overgeneralize the minority class region as it does not consider distribution of other neighbours from the majority classes. Therefore, we introduce a new generalization of SMOTE, called LN-SMOTE, which exploits more precisely information about the local neighbourhood of the considered examples. In the experiments we compare this method with original SMOTE and its two, the most related, other generalizations Borderline and Safe-Level SMOTE. All these pre-processing methods are applied together with either decision tree or Naive Bayes classifiers. The results show that the new LN-SMOTE method improves evaluation measures for the minority class.
Keywords :
Bayes methods; data mining; pattern classification; SMOTE over-sampling method; focused resampling technique; imbalanced data mining; local neighbourhood extension; naive Bayes classifiers; Breast cancer; Data mining; Decision trees; Noise; Noise measurement; Sensitivity; Training;
Conference_Titel :
Computational Intelligence and Data Mining (CIDM), 2011 IEEE Symposium on
Conference_Location :
Paris
Print_ISBN :
978-1-4244-9926-7
DOI :
10.1109/CIDM.2011.5949434