• DocumentCode
    2984531
  • Title

    Class Probability Estimates are Unreliable for Imbalanced Data (and How to Fix Them)

  • Author

    Wallace, B.C. ; Dahabreh, I.J.

  • Author_Institution
    Dept. of HSPP, Brown Univ., Providence, RI, USA
  • fYear
    2012
  • fDate
    10-13 Dec. 2012
  • Firstpage
    695
  • Lastpage
    704
  • Abstract
    Obtaining good probability estimates is imperative for many applications. The increased uncertainty and typically asymmetric costs surrounding rare events increases this need. Experts (and classification systems) often rely on probabilities to inform decisions. However, we demonstrate that class probability estimates attained via supervised learning in imbalanced scenarios systematically underestimate the probabilities for minority class instances, despite ostensibly good overall calibration. To our knowledge, this problem has not previously been explored. Motivated by our exposition of this issue, we propose a simple, effective and theoretically motivated method to mitigate the bias of probability estimates for imbalanced data that bags estimators calibrated over balanced bootstrap samples. This approach drastically improves performance on the minority instances without greatly affecting overall calibration. We show that additional uncertainty can be exploited via a Bayesian approach by considering posterior distributions over bagged probability estimates.
  • Keywords
    Bayes methods; estimation theory; learning (artificial intelligence); pattern classification; statistical distributions; Bayesian approach; class probability estimation; classification system; imbalanced data; posterior distribution; supervised learning; Bagging; Calibration; Estimation; Logistics; Mathematical model; Sensitivity; class imbalance; probability estimates;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining (ICDM), 2012 IEEE 12th International Conference on
  • Conference_Location
    Brussels
  • ISSN
    1550-4786
  • Print_ISBN
    978-1-4673-4649-8
  • Type

    conf

  • DOI
    10.1109/ICDM.2012.115
  • Filename
    6413859