DocumentCode :
3398999
Title :
FCM-SVM-RFE Gene Feature Selection Algorithm for Leukemia Classification from Microarray Gene Expression Data
Author :
Tang, Yuchun ; Zhang, Yan-Qing ; Huang, Zhen
Author_Institution :
Dept. of Comput. Sci., Georgia State Univ., Atlanta, GA
fYear :
2005
fDate :
25-25 May 2005
Firstpage :
97
Lastpage :
101
Abstract :
Selecting the most possibly cancer-related genes from huge microarray gene expression data is an important bioinformatics research topic due to its significance to improve human\´s understandability of the inherent cancer-resulting mechanism. This is actually a feature selection problem. The huge number of genes makes it impossible to execute an exhaustive search. In this work, we propose a recursive feature elimination (RFE) algorithm named FCM-SVM-RFE for the gene selection task. In each step, similar genes are grouped into clusters by the fuzzy C-means clustering algorithm, and then a support vector machine (SVM) is modeled in each cluster-induced space, the genes which contribute large to the margin width of the SVM are selected to survive to the next step. This process is repeated until a pre-specified number of genes are selected. FCM-SVM-RFE is compared with SVM-RFE on AML/ALL microarray gene expression data. The experimental results show that FCM-SVM-RFE is more accurate than SVM-RFE to predict the unknown samples. More importantly, FCM-SVM-RFE can find some compact subsets of genes on each of which a SVM with perfect prediction accuracy can be modeled. These "most informative genes" are very helpful for biologists to efficiently and effectively find the inherent cancer-resulting mechanism
Keywords :
biology computing; cancer; feature extraction; fuzzy set theory; genetics; pattern classification; pattern clustering; support vector machines; bioinformatics; cancer-related genes; cancer-resulting mechanism; exhaustive search; fuzzy c-means clustering; gene feature selection algorithm; leukemia classification; microarray gene expression data analysis; recursive feature elimination; support vector machines; Accuracy; Bioinformatics; Biological system modeling; Classification algorithms; Clustering algorithms; Data analysis; Data mining; Gene expression; Predictive models; Support vector machines;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Fuzzy Systems, 2005. FUZZ '05. The 14th IEEE International Conference on
Conference_Location :
Reno, NV
Print_ISBN :
0-7803-9159-4
Type :
conf
DOI :
10.1109/FUZZY.2005.1452375
Filename :
1452375
Link To Document :
بازگشت