Title :
Evaluation of Feature Ranking Ensembles for High-Dimensional Biomedical Data: A Case Study
Author :
Kuncheva, Ludmila I. ; Smith, C.J. ; Syed, Y. ; Phillips, C.O. ; Lewis, K.E.
Author_Institution :
Sch. of Comput. Sci., Bangor Univ., Bangor, UK
Abstract :
Developing accurate, reliable and easy to use diagnostic tests is based upon identifying a small set of highly discriminative biomarkers. This task can be cast as feature selection within a pattern recognition context. Medical data are usually of the "wide" type where the number of features is substantially larger than the number of instances. With the abundance of feature ranking methods, it is difficult to pick the most suitable one and choose a final consistent feature subset. Ensembles of ranking methods have been recommended for the task but their stability and accuracy have not been evaluated across different ranking methods. Here we present a case study consisting of 429 samples of exhaled air from smokers, 83% of whom suffer from COPD (chronic obstructive pulmonary disease). The task is to identify a discriminative subset of the 1929 volatile organic compounds (VOCs) measured through mass spectrometry. Using Pareto analysis, 16 feature ranking ensembles were evaluated with respect to three criteria: classification accuracy, area under the ROC curve and the stability of the ensemble selection. The t-statistic was rated the best among the 16 feature rankers, outperforming the currently favourite SVM ranker.
Keywords :
Pareto analysis; data handling; feature extraction; medical diagnostic computing; pattern classification; COPD; Pareto analysis; VOC; area-under-the ROC curve; chronic obstructive pulmonary disease; classification accuracy; diagnostic tests; discriminative biomarkers; ensemble selection stability; feature ranking ensemble evaluation; feature ranking methods; feature selection; high-dimensional biomedical data; mass spectrometry; pattern recognition context; t-statistic; volatile organic compounds; Accuracy; Educational institutions; Indexes; Stability criteria; Support vector machines; Vegetation; COPD; Feature selection; classifier ensembles; feature ranking; stability index;
Conference_Titel :
Data Mining Workshops (ICDMW), 2012 IEEE 12th International Conference on
Conference_Location :
Brussels
Print_ISBN :
978-1-4673-5164-5
DOI :
10.1109/ICDMW.2012.12