DocumentCode :
1989251
Title :
Discrete Methods for Association Search and Status Prediction in Genotype Case-Control Studies
Author :
Brinza, Dumitru ; Zelikovsky, Alexander
Author_Institution :
Univ. of California at San Diego La Jolla, California
fYear :
2007
fDate :
14-17 Oct. 2007
Firstpage :
270
Lastpage :
277
Abstract :
Recent improvements in high-throughput genotyping technology make possible genome-wide association studies and status prediction (classification) for common complex diseases. This paper addresses three challenges commonly facing such studies: (i) searching an enormous amount of possible gene interactions, (ii) validating reproducibility of associations and (iii) reliably predicting disease status. These challenges have been traditionally addressed in statistics while here we apply computational approaches -optimization and cross-validation. A complex risk factor is modeled as a subset of SNP´s with specified alleles and the optimization formulation asks for the one with the maximum odds ratio. When searching for disease associated risk factor, we show that greedy heuristics are much faster and lead to significantly better solutions than exhaustive heuristics in a reasonable amount of time. We propose a novel randomized complimentary greedy search method that is advantageous to the previously best search method. To measure and compare ability of search methods to find reproducible risk factors, we propose to apply a cross-validation scheme usually used for prediction validation. The proposed heuristic association search methods promise better reproducibility than exhaustive searches. We then show that k-fold cross-validation is more reliable than leave-one-out cross-validation for disease status prediction methods since it captures overtraining effect. We have applied known search methods with proposed enhancements as well as status prediction methods (based on these search methods) to real case-control studies for several diseases (Chron´s disease, autoimmune disorder, tick-born encephalitis, lung cancer, and rheumatoid arthritis). 2-and 3-fold cross-validations show that the new methods find strongly associated risk factors and reliably predict disease status for considered case-control studies.
Keywords :
cellular biophysics; diseases; genetics; medical computing; prediction theory; search engines; cross-validation; disease associated risk factor; genotype case-control studies; greedy heuristics; optimization; status prediction methods; Arthritis; Bioinformatics; Cancer; Diseases; Genomics; Lungs; Prediction methods; Reproducibility of results; Search methods; Statistics;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics and Bioengineering, 2007. BIBE 2007. Proceedings of the 7th IEEE International Conference on
Conference_Location :
Boston, MA
Print_ISBN :
978-1-4244-1509-0
Type :
conf
DOI :
10.1109/BIBE.2007.4375576
Filename :
4375576
Link To Document :
بازگشت