Title :
A Comparative Study of Data Sampling and Cost Sensitive Learning
Author :
Seiffert, Chris ; Khoshgoftaar, Taghi M. ; Hulse, Jason Van ; Napolitano, Amri
Author_Institution :
Florida Atlantic Univ., Boca Raton, FL
Abstract :
Two common challenges data mining and machine learning practitioners face in many application domains are unequal classification costs and class imbalance. Most traditional data mining techniques attempt to maximize overall accuracy rather than minimize cost. When data is imbalanced, such techniques result in models that highly favor the over represented class, the class which typically carries a lower cost of misclassification. Two techniques that have been used to address both of these issues are cost sensitive learning and data sampling. In this work, we investigate the performance of two cost sensitive learning techniques and four data sampling techniques for minimizing classification costs when data is imbalanced. We present a comprehensive suite of experiments, utilizing 15 datasets with 10 cost ratios, which have been carefully designed to ensure conclusive, significant and reliable results.
Keywords :
data mining; learning (artificial intelligence); pattern classification; application domain; class imbalance; cost sensitive learning; data mining; data sampling; machine learning; unequal classification costs; Conferences; Cost function; Data mining; Machine learning; Machine learning algorithms; Sampling methods; Stability; Statistical analysis; Training data; USA Councils; class imbalance; cost sensitive learning; data sampling;
Conference_Titel :
Data Mining Workshops, 2008. ICDMW '08. IEEE International Conference on
Conference_Location :
Pisa
Print_ISBN :
978-0-7695-3503-6
Electronic_ISBN :
978-0-7695-3503-6
DOI :
10.1109/ICDMW.2008.119