Title :
Multivariate Feature Selection using Random Subspace Classifiers for Gene Expression Data
Author :
Kamath, Vidya P. ; Hall, Lawrence O. ; Yeatman, Timothy J. ; Eschrich, Steven A.
Author_Institution :
Univ. of South Florida, Tampa
Abstract :
Gene expression analysis techniques identify important genes that predict specified outcomes based on sample characteristics. Given the small sample sizes common to these studies and the large dimensionality of the data, feature selection methods are essential. In addition, cancer-related expression analysis often involves imbalanced datasets due to rare forms of disease. Popular methods of feature selection employ univariate techniques to identify the features most suitable for analysis. We propose a multivariate technique for selecting accurate subsets of features using an approach based on random subspaces. The random subspace method is used to explore random combinations of features and only subspaces that produce accurate classifiers are retained. The method is tested on two independent gene expression datasets and compared with a univariate approach. The multivariate feature selection method resulted in a 33% improvement in classification accuracy overall and 90% improvement in classification accuracy for the minority class.
Keywords :
cancer; cellular biophysics; genetics; medical computing; molecular biophysics; cancer; gene expression; multivariate feature selection; random subspace classifiers; Biomedical engineering; Cancer; Cells (biology); Computer science; Diseases; Gene expression; Oncology; Pattern analysis; Testing; Tumors; classifiers; feature selection; gene expression; microarray; random subspaces;
Conference_Titel :
Bioinformatics and Bioengineering, 2007. BIBE 2007. Proceedings of the 7th IEEE International Conference on
Conference_Location :
Boston, MA
Print_ISBN :
978-1-4244-1509-0
DOI :
10.1109/BIBE.2007.4375685