DocumentCode :
1831877
Title :
A survey of stability analysis of feature subset selection techniques
Author :
Khoshgoftaar, Taghi M. ; Fazelpour, Alireza ; Huanjing Wang ; Wald, Randall
fYear :
2013
fDate :
14-16 Aug. 2013
Firstpage :
424
Lastpage :
431
Abstract :
With the proliferation of high-dimensional datasets across many application domains in recent years, feature selection has become an important data mining task due to its capability to improve both performance and computational efficiencies. The chosen feature subset is important not only due to its ability to improve classification performance, but also because in some domains, knowing the most important features is an end unto itself. In this latter case, one important property of a feature selection method is stability, which refers to insensitivity (robustness) of the selected features to small changes in the training dataset. In this survey paper, we discuss the problem of stability, its importance, and various stability measures used to evaluate feature subsets. We place special focus on the problem of stability as it applies to subset evaluation approaches (whether they are selected through filter-based subset techniques or wrapper-based subset selection techniques) as opposed to feature ranker stability, as subset evaluation stability leads to challenges which have been the subject of less research. We also discuss one domain of particular importance where subset evaluation (and the stability thereof) shows particular importance, but which has previously had relatively little attention for subset-based feature selection: Big Data which originates from bioinformatics.
Keywords :
data analysis; data mining; pattern classification; application domains; big data; bioinformatics; classification performance; computational efficiencies; data mining task; feature ranker stability; feature subset selection techniques; high-dimensional datasets; stability analysis; subset evaluation stability; Hamming distance; Indexes; Size measurement; Stability criteria; Thermal stability; Training; Feature selection; similarity measure; stability; stability measure; subset evaluation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Reuse and Integration (IRI), 2013 IEEE 14th International Conference on
Conference_Location :
San Francisco, CA
Type :
conf
DOI :
10.1109/IRI.2013.6642502
Filename :
6642502
Link To Document :
بازگشت