DocumentCode :
2953522
Title :
Rotation Forest and Random Oracles: Two Classifier Ensemble Methods
Author :
Rodríguez, Juan J.
Author_Institution :
Univ. of Burgos, Burgos
fYear :
2007
fDate :
20-22 June 2007
Firstpage :
3
Lastpage :
3
Abstract :
Classification methods are widely used in computer-based medical systems. Often, the accuracy of a classifier can be improved using a classifier ensemble, the combination of several classifiers. Two classifiers ensembles and their results on several medical data sets will be presented: Rotation Forest (Rodriguez, Kuncheva and Alonso) and Random Oracles (Kuncheva and Rodriguez). Rotation Forest is a method for generating classifier ensembles based on feature extraction. To create the training data for a base classifier, the feature set is randomly split into K subsets (K is a parameter of the algorithm) and Principal Component Analysis (PCA) is applied to each subset. All principal components are retained in order to preserve the variability information in the data. Thus, K axis rotations take place to form the new features for a base classifier. The idea of the rotation approach is to encourage simultaneously individual accuracy and diversity within the ensemble. Diversity is promoted through the feature extraction for each base classifier. Decision trees were chosen here because they are sensitive to rotation of the feature axes, hence the name "forest." Accuracy is sought by keeping all principal components and also using the whole data set to train each base classifier. Comparisons with various standard ensemble methods (Bagging, AdaBoost, and Random Forest) will be reported. Diversity-error diagrams reveal that Rotation Forest ensembles construct individual classifiers which are more accurate than these in AdaBoost and Random Forest and more diverse than these in Bagging, sometimes more accurate as well. A random oracle classifier is a mini-ensemble formed by a pair of classifiers and a fixed, randomly created oracle that selects between them. The random oracle can be thought of as a random discriminant function which splits the data into two subsets with no regard of any class labels or cluster structure. Two random oracles has been considered: linear and spher- ical. A random oracle classifier can be used as the base classifier of any ensemble method. It is argued that this approach encourages extra diversity in the ensemble while allowing for high accuracy of the individual ensemble members. Experiments with several data sets from UCI and 11 ensemble models will be reported. Each ensemble model will be examined with and without the oracle. The results will show that all ensemble methods benefited from the new approach, most markedly so random subspace and bagging. A further experiment with seven real medical data sets will demonstrate the validity of these findings outside the UCI data collection. When using Naive Bayes Classifiers as base classifiers, the experiments show that ensembles based solely upon the spherical oracle (and no other ensemble heuristic) outrank Bagging, Wagging, Random Subspaces, AdaBoost.Ml, MultiBoost and Decorate. Moreover, all these ensemble methods are better with any of the two random oracles than their standard versions without the oracles.
Keywords :
decision trees; learning (artificial intelligence); medical computing; pattern classification; principal component analysis; base classifier training data; classification methods; classifier accuracy; classifier ensemble methods; computer based medical systems; decision trees; diversity-error diagrams; ensemble method base classifier; feature axes rotation; feature extraction; linear random oracle; medical data sets; naive Bayes classifiers; principal cmponent analysis; random oracle classifier; random oracles; rotation forest; spherical random oracle; standard ensemble method comparison; Bagging; Civil engineering; Computer Society; Computer science; Data mining; Feature extraction; Machine learning; Pattern recognition; Principal component analysis; Speech;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer-Based Medical Systems, 2007. CBMS '07. Twentieth IEEE International Symposium on
Conference_Location :
Maribor
ISSN :
1063-7125
Print_ISBN :
0-7695-2905-4
Type :
conf
DOI :
10.1109/CBMS.2007.94
Filename :
4262617
Link To Document :
بازگشت