DocumentCode :
2219709
Title :
One tree to explain them all
Author :
Johansson, Ulf ; Sönströd, Cecilia ; Löfström, Tuve
Author_Institution :
Sch. of Bus. & Inf., Univ. of Boras, Boras, Sweden
fYear :
2011
fDate :
5-8 June 2011
Firstpage :
1444
Lastpage :
1451
Abstract :
Random forest is an often used ensemble technique, renowned for its high predictive performance. Random forests models are, however, due to their sheer complexity inherently opaque, making human interpretation and analysis impossible. This paper presents a method of approximating the random forest with just one decision tree. The approach uses oracle coaching, a recently suggested technique where a weaker but transparent model is generated using combinations of regular training data and test data initially labeled by a strong classifier, called the oracle. In this study, the random forest plays the part of the oracle, while the transparent models are decision trees generated by either the standard tree inducer J48, or by evolving genetic programs. Evaluation on 30 data sets from the UCI repository shows that oracle coaching significantly improves both accuracy and area under ROC curve, compared to using training data only. As a matter of fact, resulting single tree models are as accurate as the random forest, on the specific test instances. Most importantly, this is not achieved by inducing or evolving huge trees having perfect fidelity; a large majority of all trees are instead rather compact and clearly comprehensible. The experiments also show that the evolution outperformed J48, with regard to accuracy, but that this came at the expense of slightly larger trees.
Keywords :
decision trees; learning (artificial intelligence); pattern classification; ROC curve; UCI; decision tree; ensemble technique; genetic program; human interpretation; oracle coaching; random forest; regular training data; Accuracy; Data models; Decision trees; Predictive models; Training; Training data; Vegetation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Evolutionary Computation (CEC), 2011 IEEE Congress on
Conference_Location :
New Orleans, LA
ISSN :
Pending
Print_ISBN :
978-1-4244-7834-7
Type :
conf
DOI :
10.1109/CEC.2011.5949785
Filename :
5949785
Link To Document :
بازگشت