• DocumentCode
    3165381
  • Title

    Detecting Fractures in Classifier Performance

  • Author

    Cieslak, David A. ; Chawla, Nitesh V.

  • Author_Institution
    Univ. of Notre Dame, Notre Dame
  • fYear
    2007
  • fDate
    28-31 Oct. 2007
  • Firstpage
    123
  • Lastpage
    132
  • Abstract
    A fundamental tenet assumed by many classification algorithms is the presumption that both training and testing samples are drawn from the same distribution of data - this is the stationary distribution assumption. This entails that the past is strongly indicative of the future. However, in real world applications, many factors may alter the One True Model responsible for generating the data distribution both significantly and subtly. In circumstances violating the stationary distribution assumption, traditional validation schemes such as ten-folds and hold-out become poor performance predictors and classifier rankers. Thus, it becomes critical to discover the fracture points in classifier performance by discovering the divergence between populations. In this paper, we implement a comprehensive evaluation framework to identify bias, enabling selection of a "correct" classifier given the sample bias. To thoroughly evaluate the performance of classifiers within biased distributions, we consider the following three scenarios: missing completely at random (akin to stationary); missing at random; and missing not at random. The latter reflects the canonical sample selection bias problem.
  • Keywords
    data mining; pattern classification; biased distributions; classification algorithms; classifier performance; data distribution; fracture points; stationary distribution assumption; Classification algorithms; Computer science; Data engineering; Data mining; Decision trees; Machine learning; Measurement; Risk management; Testing; Virtual colonoscopy;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2007. ICDM 2007. Seventh IEEE International Conference on
  • Conference_Location
    Omaha, NE
  • ISSN
    1550-4786
  • Print_ISBN
    978-0-7695-3018-5
  • Type

    conf

  • DOI
    10.1109/ICDM.2007.106
  • Filename
    4470236