DocumentCode :
2984531
Title :
Class Probability Estimates are Unreliable for Imbalanced Data (and How to Fix Them)
Author :
Wallace, B.C. ; Dahabreh, I.J.
Author_Institution :
Dept. of HSPP, Brown Univ., Providence, RI, USA
fYear :
2012
fDate :
10-13 Dec. 2012
Firstpage :
695
Lastpage :
704
Abstract :
Obtaining good probability estimates is imperative for many applications. The increased uncertainty and typically asymmetric costs surrounding rare events increases this need. Experts (and classification systems) often rely on probabilities to inform decisions. However, we demonstrate that class probability estimates attained via supervised learning in imbalanced scenarios systematically underestimate the probabilities for minority class instances, despite ostensibly good overall calibration. To our knowledge, this problem has not previously been explored. Motivated by our exposition of this issue, we propose a simple, effective and theoretically motivated method to mitigate the bias of probability estimates for imbalanced data that bags estimators calibrated over balanced bootstrap samples. This approach drastically improves performance on the minority instances without greatly affecting overall calibration. We show that additional uncertainty can be exploited via a Bayesian approach by considering posterior distributions over bagged probability estimates.
Keywords :
Bayes methods; estimation theory; learning (artificial intelligence); pattern classification; statistical distributions; Bayesian approach; class probability estimation; classification system; imbalanced data; posterior distribution; supervised learning; Bagging; Calibration; Estimation; Logistics; Mathematical model; Sensitivity; class imbalance; probability estimates;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining (ICDM), 2012 IEEE 12th International Conference on
Conference_Location :
Brussels
ISSN :
1550-4786
Print_ISBN :
978-1-4673-4649-8
Type :
conf
DOI :
10.1109/ICDM.2012.115
Filename :
6413859
Link To Document :
بازگشت