DocumentCode
1641449
Title
Using genetic programming to obtain implicit diversity
Author
Johansson, Ulf ; Sönströd, Cecilia ; Löfström, Tuve ; König, Rikard
Author_Institution
Sch. of Bus. & Inf., Univ. of Boras, Boras
fYear
2009
Firstpage
2454
Lastpage
2459
Abstract
When performing predictive data mining, the use of ensembles is known to increase prediction accuracy, compared to single models. To obtain this higher accuracy, ensembles should be built from base classifiers that are both accurate and diverse. The question of how to balance these two properties in order to maximize ensemble accuracy is, however, far from solved and many different techniques for obtaining ensemble diversity exist. One such technique is bagging, where implicit diversity is introduced by training base classifiers on different subsets of available data instances, thus resulting in less accurate, but diverse base classifiers. In this paper, genetic programming is used as an alternative method to obtain implicit diversity in ensembles by evolving accurate, but different base classifiers in the form of decision trees, thus exploiting the inherent inconsistency of genetic programming. The experiments show that the GP approach outperforms standard bagging of decision trees, obtaining significantly higher ensemble accuracy over 25 UCI datasets. This superior performance stems from base classifiers having both higher average accuracy and more diversity. Implicitly introducing diversity using GP thus works very well, since evolved base classifiers tend to be highly accurate and diverse.
Keywords
data mining; decision trees; genetic algorithms; data mining; decision trees; diverse base classifiers; genetic programming; training base classifiers; Accuracy; Bagging; Classification tree analysis; Data mining; Decision trees; Equations; Genetic programming; Machine learning; Predictive models; Training data;
fLanguage
English
Publisher
ieee
Conference_Titel
Evolutionary Computation, 2009. CEC '09. IEEE Congress on
Conference_Location
Trondheim
Print_ISBN
978-1-4244-2958-5
Electronic_ISBN
978-1-4244-2959-2
Type
conf
DOI
10.1109/CEC.2009.4983248
Filename
4983248
Link To Document