Title :
Classification by CUT: Clearance under Threshold
Author :
McBride, Ryan ; Ke Wang ; Wenyuan Li
Author_Institution :
Sch. of Comput. Sci., Simon Fraser Univ., Burnaby, BC, Canada
Abstract :
Identifying bad objects hidden amidst many good objects is important for public safety and decision-making. These problems are complicated in that the cost of leaving a bad object unidentified may not be specified easily, making it difficult to apply existing cost-sensitive classification that depends on knowing a cost matrix or cost distribution. A compelling case for this "illusive cost" issue is presented in our project of identifying contaminated transformers with an industrial partner. To address this problem, we present an alternative formulation of cost-sensitive classification, Clearance Under Threshold (CUT) Classification. Given a training set, CUT classification is to partition the attribute space such that a partition is cleared if the probability of a future object in this partition being bad is less than a user-specified threshold. The goal is to clear many low-risk objects so that users can more effectively target high-risk objects. We present a solution to this problem and evaluate it on a case study for clearing contaminated transformers and on public benchmarks from UC Irvine\´s Machine Learning Repository. According to the experiments, our algorithms performed far better than the baselines derived from previous classification approaches.
Keywords :
decision making; learning (artificial intelligence); object recognition; pattern classification; probability; CUT classification; UC Irvine machine learning repository; clearance under threshold classification; contaminated transformers; cost distribution; cost matrix; cost-sensitive classification; decision making; future object probability; hidden object identification; high-risk objects; illusive cost issue; public benchmarks; public safety; Decision trees; Educational institutions; Oil insulation; Power transformer insulation; Sociology; Training; Transmission line matrix methods; Classification; Classification for Imbalanced Data;
Conference_Titel :
Data Mining (ICDM), 2014 IEEE International Conference on
Conference_Location :
Shenzhen
Print_ISBN :
978-1-4799-4303-6
DOI :
10.1109/ICDM.2014.75