• DocumentCode
    3726482
  • Title

    Calibrating Probability with Undersampling for Unbalanced Classification

  • Author

    Andrea Dal Pozzolo;Olivier Caelen;Reid A. Johnson;Gianluca Bontempi

  • Author_Institution
    Comput. Sci. Dept., Univ. Libre de Bruxelles, Brussels, Belgium
  • fYear
    2015
  • Firstpage
    159
  • Lastpage
    166
  • Abstract
    Under sampling is a popular technique for unbalanced datasets to reduce the skew in class distributions. However, it is well-known that under sampling one class modifies the priors of the training set and consequently biases the posterior probabilities of a classifier. In this paper, we study analytically and experimentally how under sampling affects the posterior probability of a machine learning model. We formalize the problem of under sampling and explore the relationship between conditional probability in the presence and absence of under sampling. Although the bias due to under sampling does not affect the ranking order returned by the posterior probability, it significantly impacts the classification accuracy and probability calibration. We use Bayes Minimum Risk theory to find the correct classification threshold and show how to adjust it after under sampling. Experiments on several real-world unbalanced datasets validate our results.
  • Keywords
    "Training","Yttrium","Testing","Electronic mail","Prediction algorithms","Computer science","Estimation"
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence, 2015 IEEE Symposium Series on
  • Print_ISBN
    978-1-4799-7560-0
  • Type

    conf

  • DOI
    10.1109/SSCI.2015.33
  • Filename
    7376606