DocumentCode :
2129301
Title :
A Comparative Study of Data Sampling and Cost Sensitive Learning
Author :
Seiffert, Chris ; Khoshgoftaar, Taghi M. ; Hulse, Jason Van ; Napolitano, Amri
Author_Institution :
Florida Atlantic Univ., Boca Raton, FL
fYear :
2008
fDate :
15-19 Dec. 2008
Firstpage :
46
Lastpage :
52
Abstract :
Two common challenges data mining and machine learning practitioners face in many application domains are unequal classification costs and class imbalance. Most traditional data mining techniques attempt to maximize overall accuracy rather than minimize cost. When data is imbalanced, such techniques result in models that highly favor the over represented class, the class which typically carries a lower cost of misclassification. Two techniques that have been used to address both of these issues are cost sensitive learning and data sampling. In this work, we investigate the performance of two cost sensitive learning techniques and four data sampling techniques for minimizing classification costs when data is imbalanced. We present a comprehensive suite of experiments, utilizing 15 datasets with 10 cost ratios, which have been carefully designed to ensure conclusive, significant and reliable results.
Keywords :
data mining; learning (artificial intelligence); pattern classification; application domain; class imbalance; cost sensitive learning; data mining; data sampling; machine learning; unequal classification costs; Conferences; Cost function; Data mining; Machine learning; Machine learning algorithms; Sampling methods; Stability; Statistical analysis; Training data; USA Councils; class imbalance; cost sensitive learning; data sampling;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining Workshops, 2008. ICDMW '08. IEEE International Conference on
Conference_Location :
Pisa
Print_ISBN :
978-0-7695-3503-6
Electronic_ISBN :
978-0-7695-3503-6
Type :
conf
DOI :
10.1109/ICDMW.2008.119
Filename :
4733920
Link To Document :
بازگشت