DocumentCode :
2369487
Title :
Constructing gene-expression based survival prediction model for Non-Small Cell Lung Cancer (NSCLC) in all stages and early stages
Author :
Wan, Ying-Wooi ; Guo, Nancy Lan
Author_Institution :
Lane Dept. of CSEE, West Virginia Univ., Morgantown, WV, USA
fYear :
2009
fDate :
1-4 Nov. 2009
Firstpage :
338
Lastpage :
338
Abstract :
Lung cancer has been the top leading cancer type for the past two decades and the overall survival rate for Non-Small Cell Lung Cancer (NSCLC) remains at a low rate of 15 percents. Current lung cancer prognosis using staging system has been studied and proven to be not accurate enough, especially on early stages. Therefore, a new prognostic model is desired. Using gene expression values of 442 Affymetrix U133A microarray data, we identified a 15-gene and a 12-gene signatures and constructed prediction models for NSCLC overall survival. The set of 442 samples were separated into three sets: UM/HLM as training set (n=256), MSK as first test set (n = 104), and DFCI as second test set (n = 82). The 15-gene signature was identified from combination of t-test and RELIEFF feature selection. By fitting the 15 genes into Cox proportional hazard model and using the median risk scores as the cut-off, patients in training set and both testing sets were stratified into two distinct survival groups significantly (log-rank P les 0.02) in KM analysis. This model was further studied on early stage patients. In KM analysis, the stratification was not significant on MSK stage 1 (log-rank P = 0.12) but significant on DFCI stage 1 (log-rank P = 0.02). The model was able to further stratified stage 1B patients into two distinct risk groups (log-rank P = 0.00765) as well. The 12-gene signature was identified from combination of t-test, SAM statistics, and RELIEFF feature selection. Naive-Bayes classifier was used to construct the prognostic model based on 5-year survival data. The 12-gene model also provided significant stratifications (log-rank P les 0.001) in KM analysis on both training and test sets. Furthermore, the 12-gene model significantly stratified stage 1 patients on both test sets (log-rank P les 0.04) in KM analysis. Consistent to 15-gene, this 12-gene model was able to significantly stratified stage 1B patients (log-rank P = 0.0047) but not stage 1A patients. Results showed- that both signatures were as significant as other published gene signatures of larger size. Also, these two gene signatures shared two common genes and the performances on the two prognostic models were comparative. Therefore, in addition to comparing the predictive performance of the models, we used IPA to further study the biological relevance of both signatures.
Keywords :
cancer; cellular biophysics; feature extraction; genetics; lung; molecular biophysics; statistical analysis; Cox proportional hazard model; KM analysis; Naive-Bayes classifier; RELIEFF feature selection; SAM statistics; gene model; gene signatures; gene-expression; lung cancer prognosis; median risk scores; nonsmall cell lung cancer; prognostic model; survival prediction model; t-test; Biological system modeling; Cancer; Gene expression; Hazards; Lungs; Predictive models; Risk analysis; Statistics; Testing; NSCLC; gene signature; gene-expression based prediction model; microarray;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics and Biomedicine Workshop, 2009. BIBMW 2009. IEEE International Conference on
Conference_Location :
Washington, DC
Print_ISBN :
978-1-4244-5121-0
Type :
conf
DOI :
10.1109/BIBMW.2009.5332086
Filename :
5332086
Link To Document :
بازگشت