• DocumentCode
    3227610
  • Title

    Stability of Filter- and Wrapper-Based Feature Subset Selection

  • Author

    Wald, Randall ; Khoshgoftaar, Taghi M. ; Napolitano, Antonio

  • Author_Institution
    Florida Atlantic Univ., Boca Raton, FL, USA
  • fYear
    2013
  • fDate
    4-6 Nov. 2013
  • Firstpage
    374
  • Lastpage
    380
  • Abstract
    High dimensionality (too many features) is found across many data science domains. Feature selection techniques address this problem by choosing a subset of features whichare more relevant to the problem at hand. These technique scan simply rank the features, but this risks including multiple features which are individually useful but which contain redundant information, subset evaluation techniques, on the other hand, consider the usefulness of whole subsets, and therefore avoid selecting redundant features. Subset-based techniques can either be filters, which apply some statistical test to thesubsets to measure their worth, or wrappers, which judgefeatures based on how effective they are when building a model. One known problem with subset-based techniques is stability: because redundant features are not included, slight changes to the input data can have a significant effect on which features are chosen. In this study, we explore the stability of feature subset selection, including two filter-based techniques and five choices for both the wrapper learner and the wrapper performance metric. We also introduce a new stability metric, the modified Kuncheva´s consistency index, which is able tocompare two feature subsets of different size. We also considerboth the stability of the feature selection technique and the average/standard deviation of feature subset size. Our results show that the Consistency feature subset evaluator has thegreatest stability overall, but CFS (Correlation-Based Feature Selection) shows moderate stability with a much smaller standard deviation of feature subset size. All of the wrapper-basedtechniques are less stable than the filter-based techniques, although the Naïve Bayes learner using the AUC performancemetric is the most stable wrapper-based approach.
  • Keywords
    Bayes methods; learning (artificial intelligence); statistical testing; correlation-based feature selection; data science domains; filter-feature subset selection; modified Kuncheva consistency index; naive Bayes learner; redundant information; stability metric; standard deviation; statistical test; subset evaluation techniques; wrapper learner; wrapper performance metric; wrapper-based feature subset selection; Indexes; Measurement; Stability criteria; Standards; Support vector machines; Twitter; Feature selection; feature subsets; filters; stability; wrappers;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Tools with Artificial Intelligence (ICTAI), 2013 IEEE 25th International Conference on
  • Conference_Location
    Herndon, VA
  • ISSN
    1082-3409
  • Print_ISBN
    978-1-4799-2971-9
  • Type

    conf

  • DOI
    10.1109/ICTAI.2013.63
  • Filename
    6735274