Abstract :
A random spherical linear oracle (RSLO) ensemble classifier for DNA microarray gene expression data is proposed. The oracle assigns different training(testing) samples to 2 sub- classifiers of the same type using hyperplane splits in order to increase the diversity of voting results since errors are not shared across sub-classifiers. Eleven classifiers were evaluated for performance as the base classifier including k nearest neighbor (kNN), naive Bayes classifier (NBC), linear discriminant analysis (LDA), learning vector quantization (LVQ1), polytomous logistic regression (PLOG), artificial neural networks (ANN), constricted particle swarm optimization (CPSO), kernel regression (KREG), radial basis function networks (RBFN), gradient descent support vector machines (SVMGD), and least squares support vector machines (SVMLS). Logistic ensembles (PLOG) resulted in the best performance when used as a base classifier for RSLO. Random hyperplane splits used in RSLO resulted in degeneration of performance at the greatest levels of CV-fold and iteration number when compared with hyperplane splits in principal direction linear oracle (PDLO), which increased with increasing CV-fold and iteration number.
Keywords :
biology computing; genetics; pattern classification; DNA microarray gene expression data; artificial neural networks; constricted particle swarm optimization; ensemble classifier; gradient descent support vector machines; hyperplane splits; k nearest neighbor; kernel regression; learning vector quantization; least squares support vector machines; linear discriminant analysis; logistic ensembles; naive Bayes classifier; polytomous logistic regression; radial basis function networks; random spherical linear oracles; Artificial neural networks; DNA; Gene expression; Linear discriminant analysis; Logistics; Nearest neighbor searches; Niobium compounds; Support vector machine classification; Support vector machines; Voting;