DocumentCode :
3705928
Title :
A hybrid sampling method for imbalanced data
Author :
Sami Gazzah;Amina Hechkel;Najoua Essoukri Ben Amara
Author_Institution :
University of Sousse, Tunisia - SAGE, Advanced Systems in Electrical Engineering, National Engineering School of Sousse, Tunisia
fYear :
2015
fDate :
3/1/2015 12:00:00 AM
Firstpage :
1
Lastpage :
6
Abstract :
With the diversification of applications and the emergence of new trends in challenging applications such as in the computer vision domain, classical machine learning systems usually perform poorly while confronting two common problems: the training data of negative examples, which outnumber the positive ones, and the large intra-class variations. These problems lead to a drop in the system performances. In this work, we propose to improve the classification accuracy in the case of imbalanced training data by equally balancing a training data set using a hybrid approach which consists in over-sampling the minority class using a “SMOTE star topology”, and under-sampling the majority class by removing instances that are considered less relevant. The feature vector deletion has been performed with respect to intra-class variations, based on the distribution criterion. The experimental results, achieved in bio-metric data, show that the proposed approach significantly improves the overall performances measured in terms of true-positive rate.
Keywords :
"Training data","Principal component analysis","Databases","Support vector machines","Training","Feature extraction","Correlation"
Publisher :
ieee
Conference_Titel :
Systems, Signals & Devices (SSD), 2015 12th International Multi-Conference on
Type :
conf
DOI :
10.1109/SSD.2015.7348093
Filename :
7348093
Link To Document :
بازگشت