• DocumentCode
    22303
  • Title

    You Are the Only Possible Oracle: Effective Test Selection for End Users of Interactive Machine Learning Systems

  • Author

    Groce, Alex ; Kulesza, Todd ; Chaoqiang Zhang ; Shamasunder, Shalini ; Burnett, Margaret ; Weng-Keen Wong ; Stumpf, Simone ; Das, S. ; Shinsel, Amber ; Bice, Forrest ; McIntosh, Kylee

  • Author_Institution
    Sch. of Electr. Eng. & Comput. Sci., Oregon State Univ., Corvallis, OR, USA
  • Volume
    40
  • Issue
    3
  • fYear
    2014
  • fDate
    Mar-14
  • Firstpage
    307
  • Lastpage
    323
  • Abstract
    How do you test a program when only a single user, with no expertise in software testing, is able to determine if the program is performing correctly? Such programs are common today in the form of machine-learned classifiers. We consider the problem of testing this common kind of machine-generated program when the only oracle is an end user: e.g., only you can determine if your email is properly filed. We present test selection methods that provide very good failure rates even for small test suites, and show that these methods work in both large-scale random experiments using a “gold standard” and in studies with real users. Our methods are inexpensive and largely algorithm-independent. Key to our methods is an exploitation of properties of classifiers that is not possible in traditional software testing. Our results suggest that it is plausible for time-pressured end users to interactively detect failures-even very hard-to-find failures-without wading through a large number of successful (and thus less useful) tests. We additionally show that some methods are able to find the arguably most difficult-to-detect faults of classifiers: cases where machine learning algorithms have high confidence in an incorrect result.
  • Keywords
    interactive systems; learning (artificial intelligence); program testing; effective test selection; email; end users; hard-to-find failures; interactive failure detection; interactive machine learning systems; machine generated program; machine learned classifiers; program testing; software testing; Electronic mail; Machine learning algorithms; Software; Software algorithms; Testing; Training; Training data; Machine learning; end-user testing; test suite size;
  • fLanguage
    English
  • Journal_Title
    Software Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0098-5589
  • Type

    jour

  • DOI
    10.1109/TSE.2013.59
  • Filename
    6682887