DocumentCode :
1205983
Title :
"Missing is useful": missing values in cost-sensitive decision trees
Author :
Zhang, Shichao ; Qin, Zhenxing ; Ling, Charles X. ; Sheng, Shengli
Author_Institution :
Dept. of Autom. Control, Beijing Univ. of Aeronaut. & Astronaut., China
Volume :
17
Issue :
12
fYear :
2005
Firstpage :
1689
Lastpage :
1693
Abstract :
Many real-world data sets for machine learning and data mining contain missing values and much previous research regards it as a problem and attempts to impute missing values before training and testing. In this paper, we study this issue in cost-sensitive learning that considers both test costs and misclassification costs. If some attributes (tests) are too expensive in obtaining their values, it would be more cost-effective to miss out their values, similar to skipping expensive and risky tests (missing values) in patient diagnosis (classification). That is, "missing is useful" as missing values actually reduces the total cost of tests and misclassifications and, therefore, it is not meaningful to impute their values. We discuss and compare several strategies that utilize only known values and that "missing is useful" for cost reduction in cost-sensitive decision tree learning.
Keywords :
cost reduction; data mining; decision trees; learning (artificial intelligence); pattern classification; cost reduction; cost-sensitive decision tree learning; data mining; machine learning; misclassification costs; missing values; patient diagnosis; real-world data sets; test costs; Costs; Data mining; Decision trees; Knowledge acquisition; Learning systems; Life testing; Machine learning; Medical diagnosis; Medical tests; Predictive models; Index Terms- Induction; knowledge acquisition; machine learning.;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2005.188
Filename :
1524968
Link To Document :
بازگشت