DocumentCode
561177
Title
Impact of Noise and Data Sampling on Stability of Feature Selection
Author
Shanab, Ahmad Abu ; Khoshgoftaar, Taghi M. ; Wald, Randall
Author_Institution
Florida Atlantic Univ., Boca Raton, FL, USA
Volume
1
fYear
2011
fDate
18-21 Dec. 2011
Firstpage
172
Lastpage
177
Abstract
High dimensionality is one of the major problems in data mining, occurring when there is a large abundance of attributes. One common technique used to alleviate high dimensionality is feature selection, the process of selecting the most relevant attributes and removing irrelevant and redundant ones. Much research has been done towards evaluating the performance of classifiers before and after feature selection, but little work has been done examining how sensitive the selected feature subsets are to changes (additions/deletions) in the dataset. In this study we evaluate the robustness of six commonly used feature selection techniques, investigating the impact of data sampling and class noise on the stability of feature selection. All experiments are carried out with six commonly used feature rankers on four groups of datasets from the biology domain. We employ three sampling techniques, and generate artificial class noise to better simulate real-world datasets. The results demonstrate that although no ranker consistently outperforms the others, Gain Ratio shows the least stability on average. Additional tests using our feature rankers for building classification models also show that a feature ranker´s stability is not an indicator of its performance in classification.
Keywords
data mining; noise; pattern classification; sampling methods; stability; artificial class noise; biology dataset; classification model; data mining; data sampling technique; feature ranker; feature selection stability; feature selection technique; gain ratio; high dimensionality problem; Gene expression; Niobium; Noise; Noise measurement; Radio frequency; Stability criteria; bioinformatics; class imbalance; classification; feature selection; noise injection; stability;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Applications and Workshops (ICMLA), 2011 10th International Conference on
Conference_Location
Honolulu, HI
Print_ISBN
978-1-4577-2134-2
Type
conf
DOI
10.1109/ICMLA.2011.74
Filename
6146964
Link To Document