Title :
Procedure for stability analysis of gene selection from cross-site gene expression data
Author :
Korecki, John N. ; Hall, Lawrence O. ; Goldgof, Dmitry ; Eschrich, Steven
Author_Institution :
Comput. Sci. & Eng., Univ. of South Florida, Tampa, FL, USA
Abstract :
Typically, thousands of gene expression levels are recorded for a group of patients, leading to the situation where the number of features far exceeds the number of examples. To combat this, researchers would want to combine gene expression data collected at different sites into one data set to reduce the magnitude of the difference between the number of features (genes) and examples (samples). This makes gene selection a critical component of any process to build models using gene expression data. For instance, in the domain of ordering cancer patients based on survival time, one might assume that utilizing genes related to cancer development and progression will allow the best model to be built. In this paper, we explore two different gene selection techniques and examine how well the genes selected compare between methods. We also check gene set consistency between data sets collected using the same protocols at different research institutions. It is shown that gene selection can result in very different sets given different training data.
Keywords :
cancer; data handling; genetics; medical computing; cancer patients; cross site gene expression data; gene selection; stability analysis; survival time; training data; Bioinformatics; Cancer; Gene expression; Signal to noise ratio; Training; Tumors; gene expression data; gene selection; signal-to-noise; stability;
Conference_Titel :
Systems, Man, and Cybernetics (SMC), 2011 IEEE International Conference on
Conference_Location :
Anchorage, AK
Print_ISBN :
978-1-4577-0652-3
DOI :
10.1109/ICSMC.2011.6083767