Title :
One tree to explain them all
Author :
Johansson, Ulf ; Sönströd, Cecilia ; Löfström, Tuve
Author_Institution :
Sch. of Bus. & Inf., Univ. of Boras, Boras, Sweden
Abstract :
Random forest is an often used ensemble technique, renowned for its high predictive performance. Random forests models are, however, due to their sheer complexity inherently opaque, making human interpretation and analysis impossible. This paper presents a method of approximating the random forest with just one decision tree. The approach uses oracle coaching, a recently suggested technique where a weaker but transparent model is generated using combinations of regular training data and test data initially labeled by a strong classifier, called the oracle. In this study, the random forest plays the part of the oracle, while the transparent models are decision trees generated by either the standard tree inducer J48, or by evolving genetic programs. Evaluation on 30 data sets from the UCI repository shows that oracle coaching significantly improves both accuracy and area under ROC curve, compared to using training data only. As a matter of fact, resulting single tree models are as accurate as the random forest, on the specific test instances. Most importantly, this is not achieved by inducing or evolving huge trees having perfect fidelity; a large majority of all trees are instead rather compact and clearly comprehensible. The experiments also show that the evolution outperformed J48, with regard to accuracy, but that this came at the expense of slightly larger trees.
Keywords :
decision trees; learning (artificial intelligence); pattern classification; ROC curve; UCI; decision tree; ensemble technique; genetic program; human interpretation; oracle coaching; random forest; regular training data; Accuracy; Data models; Decision trees; Predictive models; Training; Training data; Vegetation;
Conference_Titel :
Evolutionary Computation (CEC), 2011 IEEE Congress on
Conference_Location :
New Orleans, LA
Print_ISBN :
978-1-4244-7834-7
DOI :
10.1109/CEC.2011.5949785