• DocumentCode
    82869
  • Title

    Coalescent-Based Method for Learning Parameters of Admixture Events from Large-Scale Genetic Variation Data

  • Author

    Ming-Chi Tsai ; Blelloch, Guy ; Ravi, Reshma ; Schwartz, R.

  • Author_Institution
    Joint CMU-Pitt PhD Program in Comput. Biol., Pittsburgh, PA, USA
  • Volume
    10
  • Issue
    5
  • fYear
    2013
  • fDate
    Sept.-Oct. 2013
  • Firstpage
    1137
  • Lastpage
    1149
  • Abstract
    Detecting and quantifying the timing and the genetic contributions of parental populations to a hybrid population is an important but challenging problem in reconstructing evolutionary histories from genetic variation data. With the advent of high throughput genotyping technologies, new methods suitable for large-scale data are especially needed. Furthermore, existing methods typically assume the assignment of individuals into subpopulations is known, when that itself is a difficult problem often unresolved for real data. Here, we propose a novel method that combines prior work for inferring nonreticulate population structures with an MCMC scheme for sampling over admixture scenarios to both identify population assignments and learn divergence times and admixture proportions for those populations using genome-scale admixed genetic variation data. We validated our method using coalescent simulations and a collection of real bovine and human variation data. On simulated sequences, our methods show better accuracy and faster runtime than leading competitive methods in estimating admixture fractions and divergence times. Analysis on the real data further shows our methods to be effective at matching our best current knowledge about the relevant populations.
  • Keywords
    genetics; genomics; large-scale systems; learning (artificial intelligence); MCMC scheme; admixture events; admixture fractions; admixture scenarios; bovine variation data; coalescent-based method; genome-scale admixed genetic variation data; high throughput genotyping technologies; human variation data; large-scale genetic variation data; learning parameters; nonreticulate population structures; Bioinformatics; Computational modeling; Genomics; Sociology; Statistics; Biology and genetics; computations on discrete structures; graphs and networks; information theory;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2013.98
  • Filename
    6579604