Title :
Improved Feature Selection by Incorporating Gene Similarity Into the LASSO
Author :
Gillies, C.E. ; Gao, X. ; Patel, Neel V. ; Siadat, M.R. ; Wilson, G.D.
Author_Institution :
Dept. of Comp. Sci. & Eng., Oakland Univ., Rochester, MI, USA
Abstract :
Personalized medicine is customizing treatments to a patientâs genetic profile, and it has the potential to revolutionize medical practice. An important process used in personalized medicine is gene expression profiling. Analyzing gene expression profiles is difficult, because there are usually few patients and thousands of genes. This leads to the curse of dimensionality. In order to combat this problem, some researchers suggest using prior knowledge to enhance feature selection for supervised learning algorithms. We propose an enhancement to the LASSO, a shrinkage and selection technique that induces parameter sparsity by penalizing a modelâs objective function. Our enhancement gives preference to the selection of genes that are involved in similar biological processes. We expect this to be the case because co-expressed genes are likely to be involved in related pathways. Our modified LASSO selects similar genes by penalizing interaction terms between genes. We devised a coordinate descent algorithm to minimize the corresponding objective function. To evaluate our method, we created simulation data where we compared our model to the standard LASSO model and an interaction LASSO model. Our model outperformed both the standard LASSO and the interaction model in terms of detecting important genes and gene interactions for a reasonable number of training samples. This preliminary study leads us to believe that our method has the potential compete with state of the art methods in gene expression analysis.
Keywords :
digital simulation; learning (artificial intelligence); medical computing; LASSO; biological processes; coexpressed genes; coordinate descent algorithm; feature selection; gene similarity; medical practice; patient genetic profile; personalized medicine; simulation data; supervised learning algorithms; Biological system modeling; Gene expression; Linear programming; Linear regression; Mathematical model; Ontologies; Semantics; Gene Expression; Gene Ontology; LASSO; Regression; Semantic Similarity;
Conference_Titel :
Data Mining Workshops (ICDMW), 2012 IEEE 12th International Conference on
Conference_Location :
Brussels
Print_ISBN :
978-1-4673-5164-5
DOI :
10.1109/ICDMW.2012.59