Title :
Using rule sets to maximize ROC performance
Author_Institution :
Hewlett-Packard Co., Palo Alto, CA, USA
Abstract :
Rules are commonly used for classification because they are modular intelligible and easy to learn. Existing work in classification rule learning assumes the goal is to produce categorical classifications to maximize classification accuracy. Recent work in machine learning has pointed out the limitations of classification accuracy: when class distributions are skewed or error costs are unequal, an accuracy-maximizing rule set can perform poorly. A more flexible use of a rule set is to produce instance scores indicating the likelihood that an instance belongs to a given class. With such an ability, we can apply rule sets effectively when distributions are skewed or error costs are unequal. This paper empirically investigates different strategies for evaluating rule sets when the goal is to maximize the scoring (ROC) performance
Keywords :
data mining; errors; learning (artificial intelligence); optimisation; pattern classification; performance index; probability; sensitivity analysis; ROC performance maximization; categorical classifications; classification accuracy; classification rule learning; instance scores; machine learning; receiver operating characteristic; rule sets; scoring performance maximization; skewed class distributions; unequal error costs; Association rules; Classification tree analysis; Costs; Data mining; Decision theory; Error analysis; Laboratories; Machine learning; Milling machines; Robustness;
Conference_Titel :
Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on
Conference_Location :
San Jose, CA
Print_ISBN :
0-7695-1119-8
DOI :
10.1109/ICDM.2001.989510