DocumentCode :
1831311
Title :
Comparison of rank-based vs. score-based aggregation for ensemble gene selection
Author :
Dittman, David J. ; Khoshgoftaar, Taghi M. ; Wald, Randall ; Napolitano, Antonio
Author_Institution :
Florida Atlantic Univ., Boca Raton, FL, USA
fYear :
2013
fDate :
14-16 Aug. 2013
Firstpage :
225
Lastpage :
231
Abstract :
Gene selection is an essential step in much bioinformatics research in order to handle the thousands or tens of thousands of gene expression levels generated by gene microarrays. It is especially important that this gene selection is robust and will produce consistent results even in the face of changes to the dataset. Ensemble gene selection can help improve robustness, by combining gene rankings from multiple gene selection techniques into a single gene subset. Typically this is performed by performing multiple runs of feature (gene) selection, finding each gene´s rank within the different runs, and aggregating these ranks into a final ranked list. However, another option exists: instead of performing the ranking on each list and then aggregating, the raw scores produced by the gene ranking algorithms (which would normally be compared to generate a ranking) are aggregated directly, and these aggregate scores are used to create a final ranking. This potentially results in a different final ranking, since adjacent genes (e.g., those with no genes in between them) which are particularly close to or far from one another will be treated as such. Also, score aggregation can help reduce computation time due to the ranking step only taking place once, rather than separately for each list being aggregated. In this experiment, we use eleven DNA microarray datasets and nine univariate feature selection techniques, along with twelve feature subset sizes, to demonstrate these two approaches on a commonly used aggregation technique: mean aggregation. The results show that for seven of the nine feature selection techniques, we see strong similarity between the two approaches, but the feature subsets are not identical. However, two of the techniques do show high levels of diversity between the two approaches. This allows us to state that further research is required in order to determine the abilities of the two approaches.
Keywords :
bioinformatics; feature extraction; genetics; lab-on-a-chip; DNA microarray datasets; bioinformatic research; ensemble gene selection technique; feature subset sizes; gene expression levels; gene feature selection; gene microarrays; gene ranking algorithms; gene rankings; gene subset; mean aggregation technique; rank-based aggregation; score-based aggregation; univariate feature selection techniques; Aggregates; Bioinformatics; Biological system modeling; DNA; Measurement; Radio frequency; Robustness; DNA Microarray; Ensemble Feature Selection; Feature List Aggregation; Similarity;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Reuse and Integration (IRI), 2013 IEEE 14th International Conference on
Conference_Location :
San Francisco, CA
Type :
conf
DOI :
10.1109/IRI.2013.6642476
Filename :
6642476
Link To Document :
بازگشت