• DocumentCode
    1468367
  • Title

    Methods for Identifying SNP Interactions: A Review on Variations of Logic Regression, Random Forest and Bayesian Logistic Regression

  • Author

    Chen, Carla Chia-Ming ; Schwender, Holger ; Keith, Jonathan ; Nunkesser, Robin ; Mengersen, Kerrie ; Macrossan, Paula

  • Author_Institution
    Discipline of Math. Sci., Queensland Univ. of Technol., Brisbane, QLD, Australia
  • Volume
    8
  • Issue
    6
  • fYear
    2011
  • Firstpage
    1580
  • Lastpage
    1591
  • Abstract
    Due to advancements in computational ability, enhanced technology and a reduction in the price of genotyping, more data are being generated for understanding genetic associations with diseases and disorders. However, with the availability of large data sets comes the inherent challenges of new methods of statistical analysis and modeling. Considering a complex phenotype may be the effect of a combination of multiple loci, various statistical methods have been developed for identifying genetic epistasis effects. Among these methods, logic regression (LR) is an intriguing approach incorporating tree-like structures. Various methods have built on the original LR to improve different aspects of the model. In this study, we review four variations of LR, namely Logic Feature Selection, Monte Carlo Logic Regression, Genetic Programming for Association Studies, and Modified Logic Regression-Gene Expression Programming, and investigate the performance of each method using simulated and real genotype data. We contrast these with another tree-like approach, namely Random Forests, and a Bayesian logistic regression with stochastic search variable selection.
  • Keywords
    Monte Carlo methods; belief networks; genetics; genomics; medical computing; molecular biophysics; molecular configurations; Bayesian logistic regression; Genetic Programming for Association Studies; Monte Carlo logic regression; SNP interactions; logic feature selection; logic regression; modified logic regression-gene expression programming; random forest; random forests; real genotype data; single nucleotide polymorphism; stochastic search variable selection; tree-like structures; Bayesian methods; Genetic programming; Mathematical model; Monte Carlo methods; Regression analysis; Bayesian logistic regression with stochastic search algorithm; Genetic Programming for Association Studies; Logic regressions; Modified Logic Regression-Gene Expression Programming; Random Forest; candidate gene search.; Bayes Theorem; Computational Biology; Genotype; Logistic Models; Monte Carlo Method; Polymorphism, Single Nucleotide;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2011.46
  • Filename
    5728791