• DocumentCode
    3724054
  • Title

    Accurate Estimation of Generalization Performance for Active Learning

  • Author

    Aubrey Gress;Ian Davidson

  • fYear
    2015
  • Firstpage
    131
  • Lastpage
    140
  • Abstract
    Active learning is a crucial method in settings where a human labeling of instances is challenging to obtain. The typical active learning loop builds a model from a few labeled instances, chooses informative unlabeled instances, asks an Oracle (i.e. a human) to label them and then rebuilds the model. Active learning is widely used with much research attention focused on determining which instances to ask the human to label. However, an understudied problem is estimating the accuracy of the learner when instances are added actively. This is a problem because regular cross validation methods may not work well due to the bias in selecting instances to label. We show that existing methods to address the issue of estimating performance are not suitable for practitioners since the scaling coefficients can have high variance, the estimators can produce nonsensical results and the estimates are empirically inaccurate in the classification setting. We propose a new general active learning method which more accurately estimates generalization performance through a sampling step and a new weighted cross validation estimator. Our method can be used with a variety of query strategies and learners. We empirically illustrate the benefits of our method to the practitioner by showing it is more accurate than the standard weighted cross validation estimator and, when used as part of a termination criterion, obtains more accurate estimates of generalization error while having comparable generalization performance.
  • Keywords
    "Training","Training data","Learning systems","Standards","Labeling","Logistics","Conferences"
  • Publisher
    ieee
  • Conference_Titel
    Data Mining (ICDM), 2015 IEEE International Conference on
  • ISSN
    1550-4786
  • Type

    conf

  • DOI
    10.1109/ICDM.2015.137
  • Filename
    7373317