Author_Institution :
Dept. of Math. & Comput. Sci., Northwest Nazarene Univ., Nampa, ID, USA
Abstract :
Good ensemble methods require accurate and diverse individual classifiers, but the relationship between the diversity of individual classifiers and the accuracy of an ensemble method is not clear. In this paper, we propose a novel model called COB (core, outlier, and boundary) to quantitatively measure the accuracies of majority voting ensembles for binary classification. In this model, we first divide data items into three subsets, core, outlier, and boundary, based on the prediction correctness of these items from individual classifiers in an ensemble method. Then we measure the accuracy of the ensemble method for each subset and combine the results together. We tested the performance of the COB model on 32 datasets from the UCI repository. The experiments use three different ensemble methods (bagging, random forests, and a randomized ensemble), two different numbers of individual classifiers (7 and 51), and three different individual machine learning algorithms (decision trees, k-nearest neighbors, and support vector machines). All 24 experiments showed less than 5% average absolute errors for 32 datasets between the accuracies by the COB model and the actual accuracies of ensembles. Also the experiments showed that the COB model performed significantly better than the binomial model. The COB model suggests that to achieve a high accuracy for an ensemble method, weak individual classifiers should be partly diverse instead of fully diverse, that is, be diverse on correctly predicted items but in agreement on some incorrectly predicted items.
Keywords :
decision trees; learning (artificial intelligence); pattern classification; support vector machines; bagging; binary classification; decision trees; diverse individual classifiers; ensemble methods; k-nearest neighbors; machine learning; majority voting ensembles; prediction correctness; random forests; randomized ensemble; support vector machines; Accuracy; Bagging; Decision trees; Machine learning algorithms; Mathematical model; Predictive models; Support vector machines; accuracy; ensemble methods; majority voting; measurement;