Title :
A comparative evaluation of feature ranking methods for high dimensional bioinformatics data
Author :
Van Hulse, Jason ; Khoshgoftaar, Taghi M. ; Napolitano, Amri
Author_Institution :
Dept. of Comput. & Electr. Eng. & Comput. Sci., Florida Atlantic Univ., Boca Raton, FL, USA
Abstract :
Feature selection is an important component of data mining analysis with high dimensional data. Reducing the number of features in the dataset can have numerous positive implications, such as eliminating redundant or irrelevant features, decreasing development time and improving the performance of classification models. In this work, four filter-based feature selection techniques are compared using a wide variety of bioinformatics datasets. The first three filters, χ2, Relief-F and Information Gain, are widely used techniques that are well known to many researchers and practitioners. The fourth filter, recently proposed by our research group and denoted TBFS-AUC (i.e., Threshold-Based Feature Selection technique with the AUC metric), is compared to these three commonly-used techniques using three different classification performance metrics. The empirical results demonstrate the strong performance of our technique.
Keywords :
bioinformatics; data mining; χ2; Relief-F; TBFS-AUC; data mining analysis; feature ranking method; filter-based feature selection; high dimensional bioinformatics data; information gain; threshold-based feature selection; Automatic voltage control; Colon; Lungs; Measurement; Pancreas; Radio frequency; Bioinformatics; Feature selection; Threshold-based Feature Selection;
Conference_Titel :
Information Reuse and Integration (IRI), 2011 IEEE International Conference on
Conference_Location :
Las Vegas, NV
Print_ISBN :
978-1-4577-0964-7
Electronic_ISBN :
978-1-4577-0965-4
DOI :
10.1109/IRI.2011.6009566