The problem with ranking ensembles based on training or validation performance

Author

Johansson, Ulf ; Löfström, Tuve ; Boström, Henrik

Author_Institution

Sch. of Bus. & Inf., Univ. of Boras, Boras

fYear

2008

fDate

1-8 June 2008

Firstpage

3222

Lastpage

3228

Abstract

The main purpose of this study was to determine whether it is possible to somehow use results on training or validation data to estimate ensemble performance on novel data. With the specific setup evaluated; i.e. using ensembles built from a pool of independently trained neural networks and targeting diversity only implicitly, the answer is a resounding no. Experimentation, using 13 UCI datasets, shows that there is in general nothing to gain in performance on novel data by choosing an ensemble based on any of the training measures evaluated here. This is despite the fact that the measures evaluated include all the most frequently used; i.e. ensemble training and validation accuracy, base classifier training and validation accuracy, ensemble training and validation AUC and two diversity measures. The main reason is that all ensembles tend to have quite similar performance, unless we deliberately lower the accuracy of the base classifiers. The key consequence is, of course, that a data miner can do no better than picking an ensemble at random. In addition, the results indicate that it is futile to look for an algorithm aimed at optimizing ensemble performance by somehow selecting a subset of available base classifiers.

Keywords

data mining; learning (artificial intelligence); pattern classification; UCI dataset; base classifier training; data miner; ensemble training; independently trained neural network; ranking ensemble; Artificial neural networks; Diversity reception; Equations; Gain measurement; Informatics; Neural networks; Performance gain; Predictive models; Testing;

fLanguage

English

Publisher

ieee

Conference_Titel

Neural Networks, 2008. IJCNN 2008. (IEEE World Congress on Computational Intelligence). IEEE International Joint Conference on

Conference_Location

Hong Kong

ISSN

1098-7576

Print_ISBN

978-1-4244-1820-6

Electronic_ISBN

1098-7576

Type

conf

DOI

10.1109/IJCNN.2008.4634255

Filename

4634255