• DocumentCode
    1343992
  • Title

    Sensitivity Analysis of k-Fold Cross Validation in Prediction Error Estimation

  • Author

    Rodríguez, Juan Diego ; Pérez, Aritz ; Lozano, Jose Antonio

  • Author_Institution
    Comput. Sci. Fac., Univ. of the Basque Country (UPV-EHU), San Sebastian, Spain
  • Volume
    32
  • Issue
    3
  • fYear
    2010
  • fDate
    3/1/2010 12:00:00 AM
  • Firstpage
    569
  • Lastpage
    575
  • Abstract
    In the machine learning field, the performance of a classifier is usually measured in terms of prediction error. In most real-world problems, the error cannot be exactly calculated and it must be estimated. Therefore, it is important to choose an appropriate estimator of the error. This paper analyzes the statistical properties, bias and variance, of the k-fold cross-validation classification error estimator (k-cv). Our main contribution is a novel theoretical decomposition of the variance of the k-cv considering its sources of variance: sensitivity to changes in the training set and sensitivity to changes in the folds. The paper also compares the bias and variance of the estimator for different values of k. The experimental study has been performed in artificial domains because they allow the exact computation of the implied quantities and we can rigorously specify the conditions of experimentation. The experimentation has been performed for two classifiers (naive Bayes and nearest neighbor), different numbers of folds, sample sizes, and training sets coming from assorted probability distributions. We conclude by including some practical recommendation on the use of k-fold cross validation.
  • Keywords
    Bayes methods; estimation theory; learning (artificial intelligence); pattern classification; probability; statistical analysis; classification error estimator; k-fold cross-validation; machine learning; naive Bayes method; nearest neighbor algorithm; prediction error estimation; probability distribution; sensitivity analysis; statistical properties; bias and variance; decomposition of the variance; error estimation; k-fold cross validation; prediction error; sources of sensitivity; supervised classification.;
  • fLanguage
    English
  • Journal_Title
    Pattern Analysis and Machine Intelligence, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0162-8828
  • Type

    jour

  • DOI
    10.1109/TPAMI.2009.187
  • Filename
    5342427