Title :
Predictive modeling of lung cancer recurrence using alternative splicing events versus differential expression data
Author :
Anderson, Paul E. ; Paul, M.R. ; McCaffrey, Victoria A. ; Wilson, Richard ; Hazard, E. Starr ; Denlinger, Chadrick E. ; Watson, Paul M. ; Watson, Dennis K.
Author_Institution :
Dept. of Comput. Sci., Coll. of Charleston, Charleston, SC, USA
Abstract :
Lung cancer is the leading cause of cancer-related deaths worldwide. Biomarker discovery has become increasingly important for the effective diagnosis, prognosis and treatment of the disease. The analysis of differential gene expression data has been the primary method for biomarker discovery. Our research demonstrates that alternative splicing events (ASE) can be another source of data for predictive model creation by identifying putative biomarkers that are complementary to those found from traditional gene expression. RNASeq data from 21 patients diagnosed with lung adenocarcinoma, a non-small cell lung carcinoma (11 of which relapsed) were analyzed. After quantifying splice variants and gene expression with a bioinformatics pipeline, we were able to create predictive models, using orthogonal projections to latent structures discriminate analysis (OPLS-DA) that recognize two clinical phenotypes (disease free and relapse); thus distinguishing between more indolent and aggressive disease. Hierarchical clustering of samples pre and post predictive model feature selection showed that clustering based on ASE was more indicative of the relapse phenotype. A novel hybrid multiple objective genetic algorithm combining alternative splicing events with gene expression was used for discriminate feature selection. A post-processing examination of the putative biomarkers found by the genetic algorithm and ranked correlation tests demonstrate that the analysis of alternative splicing events provide complementary and non-redundant predictive power by identifying biologically relevant patterns that do not result in differential gene expression.
Keywords :
RNA; bioinformatics; cancer; cellular biophysics; data analysis; feature selection; genetic algorithms; genomics; lung; molecular biophysics; patient diagnosis; tumours; RNASeq data; aggressive disease; alternative splicing events; bioinformatics pipeline; biologically relevant patterns; biomarker discovery; cancer-related deaths; clinical phenotypes; complementary predictive power; differential gene expression data analysis; disease diagnosis; disease prognosis; disease treatment; hierarchical clustering; indolent disease; latent structures discriminate analysis; lung adenocarcinoma; nonredundant predictive power; nonsmall cell lung carcinoma; novel hybrid multiple objective genetic algorithm; orthogonal projections; patient diagnosis; post-predictive model feature selection; post-processing examination; predictive lung cancer recurrence modeling; predictive model creation; prepredictive model feature selection; putative biomarkers; ranked correlation tests; relapse phenotype; splice variants; traditional gene expression; Cancer; Gene expression; Genetic algorithms; Lungs; Predictive models; RNA; Splicing;
Conference_Titel :
Computational Intelligence in Bioinformatics and Computational Biology, 2014 IEEE Conference on
Conference_Location :
Honolulu, HI
DOI :
10.1109/CIBCB.2014.6845521