DocumentCode :
1376148
Title :
A Top-r Feature Selection Algorithm for Microarray Gene Expression Data
Author :
Sharma, Ashok ; Imoto, Seiya ; Miyano, Satoru
Author_Institution :
Lab. of DNA Inf. Anal., Univ. of Tokyo, Tokyo, Japan
Volume :
9
Issue :
3
fYear :
2012
Firstpage :
754
Lastpage :
764
Abstract :
Most of the conventional feature selection algorithms have a drawback whereby a weakly ranked gene that could perform well in terms of classification accuracy with an appropriate subset of genes will be left out of the selection. Considering this shortcoming, we propose a feature selection algorithm in gene expression data analysis of sample classifications. The proposed algorithm first divides genes into subsets, the sizes of which are relatively small (roughly of size h), then selects informative smaller subsets of genes (of size r <; h) from a subset and merges the chosen genes with another gene subset (of size r) to update the gene subset. We repeat this process until all subsets are merged into one informative subset. We illustrate the effectiveness of the proposed algorithm by analyzing three distinct gene expression data sets. Our method shows promising classification accuracy for all the test data sets. We also show the relevance of the selected genes in terms of their biological functions.
Keywords :
bioinformatics; data analysis; feature extraction; genetics; lab-on-a-chip; set theory; biological functions; classification accuracy; gene expression data analysis; gene subset; informative subset; microarray gene expression data; top-r feature selection algorithm; Accuracy; Algorithm design and analysis; Bioinformatics; Cancer; Classification algorithms; Gene expression; DNA microarray gene expression data.; Feature selection; classification accuracy; top-r features; Algorithms; Databases, Factual; Gene Expression; Gene Expression Profiling; Humans; Oligonucleotide Array Sequence Analysis;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2011.151
Filename :
6081851
Link To Document :
بازگشت