DocumentCode
2219709
Title
One tree to explain them all
Author
Johansson, Ulf ; Sönströd, Cecilia ; Löfström, Tuve
Author_Institution
Sch. of Bus. & Inf., Univ. of Boras, Boras, Sweden
fYear
2011
fDate
5-8 June 2011
Firstpage
1444
Lastpage
1451
Abstract
Random forest is an often used ensemble technique, renowned for its high predictive performance. Random forests models are, however, due to their sheer complexity inherently opaque, making human interpretation and analysis impossible. This paper presents a method of approximating the random forest with just one decision tree. The approach uses oracle coaching, a recently suggested technique where a weaker but transparent model is generated using combinations of regular training data and test data initially labeled by a strong classifier, called the oracle. In this study, the random forest plays the part of the oracle, while the transparent models are decision trees generated by either the standard tree inducer J48, or by evolving genetic programs. Evaluation on 30 data sets from the UCI repository shows that oracle coaching significantly improves both accuracy and area under ROC curve, compared to using training data only. As a matter of fact, resulting single tree models are as accurate as the random forest, on the specific test instances. Most importantly, this is not achieved by inducing or evolving huge trees having perfect fidelity; a large majority of all trees are instead rather compact and clearly comprehensible. The experiments also show that the evolution outperformed J48, with regard to accuracy, but that this came at the expense of slightly larger trees.
Keywords
decision trees; learning (artificial intelligence); pattern classification; ROC curve; UCI; decision tree; ensemble technique; genetic program; human interpretation; oracle coaching; random forest; regular training data; Accuracy; Data models; Decision trees; Predictive models; Training; Training data; Vegetation;
fLanguage
English
Publisher
ieee
Conference_Titel
Evolutionary Computation (CEC), 2011 IEEE Congress on
Conference_Location
New Orleans, LA
ISSN
Pending
Print_ISBN
978-1-4244-7834-7
Type
conf
DOI
10.1109/CEC.2011.5949785
Filename
5949785
Link To Document