Title :
Meta-classification model for diabetes onset forecast: A proof of concept
Author :
Nnamoko, N.A. ; Arshad, F.N. ; England, D. ; Vora, Jiten
Author_Institution :
Sch. of Comput. & Math. Sci., Liverpool John Moores Univ., Liverpool, UK
Abstract :
We propose a robust diabetes prediction model by examining how predictions from several learning algorithms, performing the same task, can be exploited to yield a higher performance than the best individual learning algorithm. The task was to forecast the onset of non-insulin dependent diabetes within a five year period using previous vital sign examination information. Experimental data is a 768 × 9 array arranged as row vectors, each with observed input in all but the last column which contains a single vector of output. Five well-known models were trained with associated learning algorithms (Sequential Minimal Optimization (SMO), Radial Basis Function (RBF), C4.5, Naïve Bayes and RIPPER) on the same dataset, and performance compared using Accuracy, Receiver Operating Characteristics area (aROC) and Speed as metrics. After comparison, a combiner (Meta) model, using a simple Logistic Regression algorithm, was trained to make a final prediction using outputs of the best and worst performing algorithms (in the order Accuracy - aROC - Speed) as additional inputs. C4.5 had the best performance with Accuracy of 77.9% and aROC of 83.1%. The RBF gave the lowest performance with Accuracy of 73.6% and aROC of 80.5%. The Meta model achieved a classification accuracy of 77.0% with aROC of 84.9%. The slight decline in Accuracy was because we used aROC (not Accuracy) as the performance metric during selection.
Keywords :
Bayes methods; classification; decision trees; diseases; feature extraction; knowledge based systems; learning (artificial intelligence); medical diagnostic computing; meta data; optimisation; patient diagnosis; radial basis function networks; regression analysis; sensitivity analysis; C4.5 method; RBF method; RIPPER method; SMO method; aROC metric; accuracy metric; classification accuracy; classifier selection; combiner model; diabetes onset forecast; experimental data array; learning algorithm performance; logistic regression algorithm; meta-classification model; metamodel; model training; naive Bayes method; noninsulin dependent diabetes; performance metric; radial basis function; receiver operating characteristics area metric; robust diabetes prediction model; row vector; sequential minimal optimization method; single output vector; speed metric; time 5 year; vital sign examination information; Accuracy; Classification algorithms; Decision trees; Diabetes; Prediction algorithms; Predictive models; Training; Decision Tree; Diabetes; Naïve Bayes; Neural Network; Rule-based classifier; Support Vector Machine;
Conference_Titel :
Bioinformatics and Biomedicine (BIBM), 2014 IEEE International Conference on
Conference_Location :
Belfast
DOI :
10.1109/BIBM.2014.6999247