• DocumentCode
    65108
  • Title

    Reliable and Fast Estimation of Recombination Rates by Convergence Diagnosis and Parallel Markov Chain Monte Carlo

  • Author

    Jing Guo ; Jain, R. ; Peng Yang ; Rui Fan ; Chee Keong Kwoh ; Jie Zheng

  • Author_Institution
    Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore, Singapore
  • Volume
    11
  • Issue
    1
  • fYear
    2014
  • fDate
    Jan.-Feb. 2014
  • Firstpage
    63
  • Lastpage
    72
  • Abstract
    Genetic recombination is an essential event during the process of meiosis resulting in an exchange of segments between paired chromosomes. Estimating recombination rate is crucial for understanding the process of recombination. Experimental methods are normally difficult and limited to small scale estimations. Thus statistical methods using population genetics data are important for large-scale analysis. LDhat is an extensively used statistical method using rjMCMC algorithm to predict recombination rates. Due to the complexity of rjMCMC scheme, LDhat may take a long time for large SNP data sets. In addition, rjMCMC parameters should be manually defined in the original program which directly impact results. To address these issues, we designed an improved algorithm based on LDhat implementing MCMC convergence diagnostic algorithms to automatically predict values of parameters and monitor the mixing process. Then parallel computation methods were employed to further accelerate the new program. The new algorithms have been tested on ten samples from HapMap phase 2 data set. The results were compared with previous code and showed nearly identical output. However, our new methods achieved significant acceleration proving that they are more efficient and reliable for the estimation of recombination rates. The stand-alone package is freely available for download http://www.ntu.edu.sg/home/zhengjie/software/CPLDhat.
  • Keywords
    Markov processes; Monte Carlo methods; biology computing; genetics; parallel processing; statistical analysis; HapMap phase 2 data set; LDhat; MCMC convergence diagnostic algorithms; convergence diagnosis; fast estimation; genetic recombination; large-scale analysis; meiosis; paired chromosomes; parallel Markov chain Monte Carlo; parallel computation methods; population genetics data; recombination rates; reliable estimation; rjMCMC algorithm; rjMCMC parameters; rjMCMC scheme; segment exchange; statistical methods; Algorithm design and analysis; Bioinformatics; Convergence; Estimation; Markov processes; Prediction algorithms; Program processors; Recombination hotspot; convergence diagnosis; genome instability; parallel computation; reversible jump MCMC;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2013.133
  • Filename
    6646172