مرکز منطقه ای اطلاع رساني علوم و فناوري - Reusing Genetic Programming for Ensemble Selection in Classification of Unbalanced Data

DocumentCode :

4545

Title :

Reusing Genetic Programming for Ensemble Selection in Classification of Unbalanced Data

Author :

Bhowan, Urvesh ; Johnston, Michael ; Mengjie Zhang ; Xin Yao

Author_Institution :

Knowledge & Data Eng. Group, Trinity Coll. Dublin, Dublin, Ireland

Volume :

Issue :

fYear :

2014

fDate :

Dec. 2014

Firstpage :

893

Lastpage :

908

Abstract :

Classification algorithms can suffer from performance degradation when the class distribution is unbalanced. This paper develops a two-step approach to evolving ensembles using genetic programming (GP) for unbalanced data. The first step uses multiobjective (MO) GP to evolve a Pareto-approximated front of GP classifiers to form the ensemble by trading-off the minority and the majority class against each other during learning. The MO component alleviates the reliance on sampling to artificially rebalance the data. The second step, which is the focus this paper, proposes a novel ensemble selection approach using GP to automatically find/choose the best individuals for the ensemble. This new GP approach combines multiple Pareto-approximated front members into a single composite genetic program solution to represent the (optimized) ensemble. This ensemble representation has two main advantages/novelties over traditional genetic algorithm (GA) approaches. First, by limiting the depth of the composite solution trees, we use selection pressure during evolution to find small highly-cooperative groups of individuals for the ensemble. This means that ensemble sizes are not fixed a priori (as in GA), but vary depending on the strength of the base learners. Second, we compare different function set operators in the composite solution trees to explore new ways to aggregate the member outputs and thus, control how the ensemble computes its output. We show that the proposed GP approach evolves smaller more diverse ensembles compared to an established ensemble selection algorithm, while still performing as well as, or better than the established approach. The evolved GP ensembles also perform well compared to other bagging and boosting approaches, particularly on tasks with high levels of class imbalance.

Keywords :

Pareto optimisation; approximation theory; genetic algorithms; learning (artificial intelligence); pattern classification; trees (mathematics); GP classifiers; Pareto-approximated front; bagging approach; boosting approach; composite solution trees; ensemble selection approach; genetic programming; learning; single composite genetic program solution; unbalanced data classification; Accuracy; Bagging; Genetic algorithms; Silicon; Sociology; Statistics; Training; Classification; ensemble machine learning; genetic programming; unbalanced data;

fLanguage :

English

Journal_Title :

Evolutionary Computation, IEEE Transactions on

Publisher :

ieee

ISSN :

1089-778X

Type :

jour

DOI :

10.1109/TEVC.2013.2293393

Filename :

6677603

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=4545