Title :
Safe level graph for synthetic minority over-sampling techniques
Author :
Bunkhumpornpat, Chumphol ; Subpaiboonkit, Sitthichoke
Author_Institution :
Dept. of Comput. Sci., Chiang Mai Univ., Chiang Mai, Thailand
Abstract :
In the class imbalance problem, most existent classifiers which are designed by the distribution of balance datasets fail to recognize minority classes since a large number of negative instances can dominate a few positive instances. Borderline-SMOTE and Safe-Level-SMOTE are over-sampling techniques which are applied to handle this situation by generating synthetic instances in different regions. The former operates on the border of a minority class while the latter works inside the class far from the border. Unfortunately, a data miner is unable to conveniently justify a suitable SMOTE for each dataset. In this paper, a safe level graph is proposed as a guideline tool for selecting an appropriate SMOTE and describes the characteristic of a minority class in an imbalance dataset. Relying on advice of a safe level graph, the experimental success rate is shown to reach 73% when an F-measure is used as the performance measure and 78% for satisfactory AUCs.
Keywords :
data mining; graph theory; pattern classification; sampling methods; F-measure; balance dataset distribution; borderline-SMOTE; class imbalance problem; data miner; guideline tool; minority class border; safe level graph; safe-level-SMOTE; synthetic minority over-sampling techniques; Decision trees; Diabetes; Earth; Noise; Remote sensing; Satellites; Vectors; Borderline-SMOTE; Safe-Level-SMOTE; class imbalance problem; over-sampling; safe level graph;
Conference_Titel :
Communications and Information Technologies (ISCIT), 2013 13th International Symposium on
Conference_Location :
Surat Thani
Print_ISBN :
978-1-4673-5578-0
DOI :
10.1109/ISCIT.2013.6645923