DocumentCode :
2691659
Title :
The effect of measurement approach and noise level on gene selection stability
Author :
Wald, Randall ; Khoshgoftaar, Taghi M. ; Shanab, Ahmad Abu
Author_Institution :
Florida Atlantic Univ., Boca Raton, FL, USA
fYear :
2012
fDate :
4-7 Oct. 2012
Firstpage :
1
Lastpage :
5
Abstract :
Many biological datasets exhibit high dimensionality, a large abundance of attributes (genes) per instance (sample). This problem is often solved using feature selection, which works by selecting the most relevant attributes and removing irrelevant and redundant attributes. Although feature selection techniques are often evaluated based on the performance of classification models (e.g., algorithms designed to distinguish between multiple classes of instances, such as cancerous vs. noncancerous) built using the selected features, another important criterion which is often neglected is stability, the degree of agreement among a feature selection technique´s outputs when there are changes to the dataset. More stable feature selection techniques will give the same features even if aspects of the data change. In this study we consider two different approaches for evaluating the stability of feature selection techniques, with each approach consisting of noise injection followed by feature ranking. The two approaches differ in that the first approach compares the features selected from the noisy datasets with the features selected from the original (clean) dataset, while the second approach performs pairwise comparisons among the results from the noisy datasets. To evaluate these two approaches, we use four biological datasets and employ six commonly-used feature rankers. We draw two primary conclusions from our experiments: First, the rankers show different levels of stability in the face of noise. In particular, the ReliefF ranker has significantly greater stability than the other rankers. Also, we found that both approaches gave the same results in terms of stability patterns, although the first approach had greater stability overall. Additionally, because the first approach is significantly less computationally expensive, future studies may employ a faster technique to gain the same results.
Keywords :
bioinformatics; biological techniques; data mining; genetics; feature ranking; feature selection; gene selection stability; high dimensional biological datasets; measurement approach effects; noise injection; noise level effects; noisy datasets; Bioinformatics; Cancer; Lungs; Noise; Noise measurement; Stability criteria; Stability; bioinformatics; feature selection; noise injection;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics and Biomedicine (BIBM), 2012 IEEE International Conference on
Conference_Location :
Philadelphia, PA
Print_ISBN :
978-1-4673-2559-2
Electronic_ISBN :
978-1-4673-2558-5
Type :
conf
DOI :
10.1109/BIBM.2012.6392713
Filename :
6392713
Link To Document :
بازگشت