DocumentCode
3165381
Title
Detecting Fractures in Classifier Performance
Author
Cieslak, David A. ; Chawla, Nitesh V.
Author_Institution
Univ. of Notre Dame, Notre Dame
fYear
2007
fDate
28-31 Oct. 2007
Firstpage
123
Lastpage
132
Abstract
A fundamental tenet assumed by many classification algorithms is the presumption that both training and testing samples are drawn from the same distribution of data - this is the stationary distribution assumption. This entails that the past is strongly indicative of the future. However, in real world applications, many factors may alter the One True Model responsible for generating the data distribution both significantly and subtly. In circumstances violating the stationary distribution assumption, traditional validation schemes such as ten-folds and hold-out become poor performance predictors and classifier rankers. Thus, it becomes critical to discover the fracture points in classifier performance by discovering the divergence between populations. In this paper, we implement a comprehensive evaluation framework to identify bias, enabling selection of a "correct" classifier given the sample bias. To thoroughly evaluate the performance of classifiers within biased distributions, we consider the following three scenarios: missing completely at random (akin to stationary); missing at random; and missing not at random. The latter reflects the canonical sample selection bias problem.
Keywords
data mining; pattern classification; biased distributions; classification algorithms; classifier performance; data distribution; fracture points; stationary distribution assumption; Classification algorithms; Computer science; Data engineering; Data mining; Decision trees; Machine learning; Measurement; Risk management; Testing; Virtual colonoscopy;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining, 2007. ICDM 2007. Seventh IEEE International Conference on
Conference_Location
Omaha, NE
ISSN
1550-4786
Print_ISBN
978-0-7695-3018-5
Type
conf
DOI
10.1109/ICDM.2007.106
Filename
4470236
Link To Document