DocumentCode :
561177
Title :
Impact of Noise and Data Sampling on Stability of Feature Selection
Author :
Shanab, Ahmad Abu ; Khoshgoftaar, Taghi M. ; Wald, Randall
Author_Institution :
Florida Atlantic Univ., Boca Raton, FL, USA
Volume :
1
fYear :
2011
fDate :
18-21 Dec. 2011
Firstpage :
172
Lastpage :
177
Abstract :
High dimensionality is one of the major problems in data mining, occurring when there is a large abundance of attributes. One common technique used to alleviate high dimensionality is feature selection, the process of selecting the most relevant attributes and removing irrelevant and redundant ones. Much research has been done towards evaluating the performance of classifiers before and after feature selection, but little work has been done examining how sensitive the selected feature subsets are to changes (additions/deletions) in the dataset. In this study we evaluate the robustness of six commonly used feature selection techniques, investigating the impact of data sampling and class noise on the stability of feature selection. All experiments are carried out with six commonly used feature rankers on four groups of datasets from the biology domain. We employ three sampling techniques, and generate artificial class noise to better simulate real-world datasets. The results demonstrate that although no ranker consistently outperforms the others, Gain Ratio shows the least stability on average. Additional tests using our feature rankers for building classification models also show that a feature ranker´s stability is not an indicator of its performance in classification.
Keywords :
data mining; noise; pattern classification; sampling methods; stability; artificial class noise; biology dataset; classification model; data mining; data sampling technique; feature ranker; feature selection stability; feature selection technique; gain ratio; high dimensionality problem; Gene expression; Niobium; Noise; Noise measurement; Radio frequency; Stability criteria; bioinformatics; class imbalance; classification; feature selection; noise injection; stability;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Applications and Workshops (ICMLA), 2011 10th International Conference on
Conference_Location :
Honolulu, HI
Print_ISBN :
978-1-4577-2134-2
Type :
conf
DOI :
10.1109/ICMLA.2011.74
Filename :
6146964
Link To Document :
بازگشت