DocumentCode :
2211022
Title :
Local neighbourhood extension of SMOTE for mining imbalanced data
Author :
Maciejewski, Tomasz ; Stefanowski, Jerzy
Author_Institution :
Inst. of Comput. Sci., Poznan Univ. of Technol., Poznań, Poland
fYear :
2011
fDate :
11-15 April 2011
Firstpage :
104
Lastpage :
111
Abstract :
In this paper we discuss problems of inducing classifiers from imbalanced data and improving recognition of minority class using focused resampling techniques. We are particularly interested in SMOTE over-sampling method that generates new synthetic examples from the minority class between the closest neighbours from this class. However, SMOTE could also overgeneralize the minority class region as it does not consider distribution of other neighbours from the majority classes. Therefore, we introduce a new generalization of SMOTE, called LN-SMOTE, which exploits more precisely information about the local neighbourhood of the considered examples. In the experiments we compare this method with original SMOTE and its two, the most related, other generalizations Borderline and Safe-Level SMOTE. All these pre-processing methods are applied together with either decision tree or Naive Bayes classifiers. The results show that the new LN-SMOTE method improves evaluation measures for the minority class.
Keywords :
Bayes methods; data mining; pattern classification; SMOTE over-sampling method; focused resampling technique; imbalanced data mining; local neighbourhood extension; naive Bayes classifiers; Breast cancer; Data mining; Decision trees; Noise; Noise measurement; Sensitivity; Training;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Intelligence and Data Mining (CIDM), 2011 IEEE Symposium on
Conference_Location :
Paris
Print_ISBN :
978-1-4244-9926-7
Type :
conf
DOI :
10.1109/CIDM.2011.5949434
Filename :
5949434
Link To Document :
بازگشت