Stability of Filter- and Wrapper-Based Feature Subset Selection

Author

Wald, Randall ; Khoshgoftaar, Taghi M. ; Napolitano, Antonio

Author_Institution

Florida Atlantic Univ., Boca Raton, FL, USA

fYear

2013

fDate

4-6 Nov. 2013

Firstpage

374

Lastpage

380

Abstract

High dimensionality (too many features) is found across many data science domains. Feature selection techniques address this problem by choosing a subset of features whichare more relevant to the problem at hand. These technique scan simply rank the features, but this risks including multiple features which are individually useful but which contain redundant information, subset evaluation techniques, on the other hand, consider the usefulness of whole subsets, and therefore avoid selecting redundant features. Subset-based techniques can either be filters, which apply some statistical test to thesubsets to measure their worth, or wrappers, which judgefeatures based on how effective they are when building a model. One known problem with subset-based techniques is stability: because redundant features are not included, slight changes to the input data can have a significant effect on which features are chosen. In this study, we explore the stability of feature subset selection, including two filter-based techniques and five choices for both the wrapper learner and the wrapper performance metric. We also introduce a new stability metric, the modified Kuncheva´s consistency index, which is able tocompare two feature subsets of different size. We also considerboth the stability of the feature selection technique and the average/standard deviation of feature subset size. Our results show that the Consistency feature subset evaluator has thegreatest stability overall, but CFS (Correlation-Based Feature Selection) shows moderate stability with a much smaller standard deviation of feature subset size. All of the wrapper-basedtechniques are less stable than the filter-based techniques, although the Naïve Bayes learner using the AUC performancemetric is the most stable wrapper-based approach.

Keywords

Bayes methods; learning (artificial intelligence); statistical testing; correlation-based feature selection; data science domains; filter-feature subset selection; modified Kuncheva consistency index; naive Bayes learner; redundant information; stability metric; standard deviation; statistical test; subset evaluation techniques; wrapper learner; wrapper performance metric; wrapper-based feature subset selection; Indexes; Measurement; Stability criteria; Standards; Support vector machines; Twitter; Feature selection; feature subsets; filters; stability; wrappers;

fLanguage

English

Publisher

ieee

Conference_Titel

Tools with Artificial Intelligence (ICTAI), 2013 IEEE 25th International Conference on

Conference_Location

Herndon, VA

ISSN

1082-3409

Print_ISBN

978-1-4799-2971-9

Type

conf

DOI

10.1109/ICTAI.2013.63

Filename

6735274