DocumentCode :
1777017
Title :
Diversity and separable metrics in over-sampling technique for imbalanced data classification
Author :
Mahmoudi, Shadi ; Moradi, Parham ; Akhlaghian, Fardin ; Moradi, Rasoul
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of Kurdistan, Kurdistan, Iran
fYear :
2014
fDate :
29-30 Oct. 2014
Firstpage :
152
Lastpage :
158
Abstract :
The imbalance data problem in classification is a significant research area and has attracted a lot attention in recent years. Rebalancing class distribution techniques such as over-sampling or under-sampling are the most common approaches to deal with this problem. This paper presents a new method so called Diversity and Separable Metrics in Over-Sampling Technique (DSMOTE) to handle the imbalanced learning problems. The main idea of the DSMOTE is to use a diversity and separable measure which shows a positive impact on the minority class. This improvement is achieved by reduce overfitting by using a diversity measure. Moreover by using the separable measure the risk of generating new samples in decision boundaries with hard-to-learn samples is decreased. The proposed method improves the learning accuracy in three stages including; (1) removal of abnormal samples from minority class, (2) selecting the top three samples of minority class according to desired criteria and (3) generating new sample using selected samples. The experiments are conducted on five real world datasets which is taken from Iran University of Medical Science and also six different UCI datasets. Moreover, three different classifiers, four resampling algorithms and six performance evaluation measures are used to evaluate the proposed method. The reported results indicate that the proposed approach demonstrates a better or at least comparable performance compared to those of the state-of-the-art methods.
Keywords :
learning (artificial intelligence); pattern classification; sampling methods; DSMOTE method; Iran University of Medical Science; UCI dataset; class distribution techniques; diversity measure; diversity metric; imbalanced data classification; imbalanced learning problem; over-sampling technique; separable measure; separable metric; under-sampling technique; Accuracy; Classification algorithms; Data mining; Educational institutions; Euclidean distance; Vectors; Classification problems; Diversity measure; Imbalanced Data; Over-Sampling; Separable Measure;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer and Knowledge Engineering (ICCKE), 2014 4th International eConference on
Conference_Location :
Mashhad
Print_ISBN :
978-1-4799-5486-5
Type :
conf
DOI :
10.1109/ICCKE.2014.6993409
Filename :
6993409
Link To Document :
بازگشت