Title :
Recursive Fuzzy Granulation for Gene Subsets Extraction and Cancer Classification
Author :
Tang, Yuchun ; Zhang, Yan-Qing ; Huang, Zhen ; Hu, Xiaohua ; Zhao, Yichuan
Author_Institution :
Secure Comput. Corp., Alpharetta, GA
Abstract :
A typical microarray gene expression dataset is usually both extremely sparse and imbalanced. To select multiple highly informative gene subsets for cancer classification and diagnosis, a new fuzzy granular support vector machine-recursive feature elimination algorithm (FGSVM-RFE) is designed in this paper. As a hybrid algorithm of statistical learning, fuzzy clustering, and granular computing, the FGSVM-RFE separately eliminates irrelevant, redundant, or noisy genes in different granules at different stages and selects highly informative genes with potentially different biological functions in balance. Empirical studies on three public datasets demonstrate that the FGSVM-RFE outperforms state-of-the-art approaches. Moreover, the FGSVM-RFE can extract multiple gene subsets on each of which a classifier can be modeled with 100% accuracy. Specifically, the independent testing accuracy for the prostate cancer dataset is significantly improved. The previous best result is 86% with 16 genes and our best result is 100% with only eight genes. The identified genes are annotated by Onto-Express to be biologically meaningful.
Keywords :
cancer; feature extraction; fuzzy set theory; genetics; medical computing; pattern classification; pattern clustering; recursive functions; support vector machines; tumours; SVM; biological function; cancer classification; cancer diagnosis; fuzzy clustering; fuzzy granular support vector machine; gene subsets extraction; granular computing; highly informative genes; microarray gene expression dataset; recursive feature elimination algorithm; recursive fuzzy granulation; statistical learning; Cancer Classification; Cancer classification; Clustering; Fuzzy C-Means; Gene Selection; Granular Computing; Microarray Gene Expression Data Analysis; Recursive Feature Elimination; Relevance Index; Support Vector Machines; fuzzy C-means clustering; gene selection; granular computing; microarray gene expression data analysis; recursive feature elimination (RFE); relevance index (RI); support vector machines (SVMs); Algorithms; Artificial Intelligence; Computational Biology; Databases, Genetic; Fuzzy Logic; Gene Expression Profiling; Humans; Male; Neoplasms; Oligonucleotide Array Sequence Analysis; Prostatic Neoplasms;
Journal_Title :
Information Technology in Biomedicine, IEEE Transactions on
DOI :
10.1109/TITB.2008.920787