Title :
Parallel Pair-HMM SNP Detection
Author :
Clement, Nathan L. ; Shepherd, Brent A. ; Bodily, Paul ; Tumur-Ochir, Sukhbat ; Gim, Younghoon ; Snell, Quinn ; Clement, Mark J. ; Johnson, W. Evan
Author_Institution :
Dept. of Comput. Sci., Univ. of Texas at Austin, Austin, TX, USA
Abstract :
I. MOTIVATION: Due to the massive amounts of data generated from each instrument run, next generation sequencing technologies have presented researchers with unique analytical challenges which require innovative, computationally efficient statistical solutions. Here we present a parallel implementation of a probabilistic Pair-Hidden Markov Model for base calling and SNP detection in next generation sequencing data. Our approach incorporates multiple sources of error into the base calling procedure which leads to more accurate results. In addition, our approach applies a likelihood ratio test that provides researchers with straight-forward SNP calling cutoffs based on a p-value cutoff or a false discovery control. II. RESULTS: We have developed GNUMAP-SNP, which is a highly accurate approach for the identification of SNPs in next generation sequencing data. By utilizing a novel probabilistic Pair-Hidden Markov Model, GNUMAP-SNP effectively accounts for uncertainty in the read calls as well as read mapping in an unbiased fashion. Our results show that GNUMAP-SNP has both high sensitivity and high specificity throughout the genome, which is especially true in repeat regions or in areas with low read coverage. In addition, we propose a statistical framework that accounts for the background noise using straightforward statistical cutoffs which filters out false-positive results. The parallel implementation of SNP calling achieves near linear speedup on distributed memory or shared memory platforms. III. AVAILABILITY: GNUMAP-SNP is available as a module in the GNUMAP probabilistic read mapping software. GNUMAP is freely available for download at: http://dna.cs.byu.edu/gnumap/.
Keywords :
bioinformatics; distributed shared memory systems; genetics; hidden Markov models; probability; GNUMAP probabilistic read mapping software; SNP calling cutoffs; background noise; base calling procedure; distributed memory; false discovery control; genome; likelihood ratio testing; parallel pair-HMM SNP detection; probabilistic pair-hidden Markov model; sequencing technology; shared memory platform; statistical cutoff; statistical solution; Bioinformatics; Frequency modulation; Genomics; Markov processes; Next generation networking; Probabilistic logic; Probability; biology computing; next-generation sequencing; parallel computing; sequence mappers; short-read mapping;
Conference_Titel :
Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2012 IEEE 26th International
Conference_Location :
Shanghai
Print_ISBN :
978-1-4673-0974-5
DOI :
10.1109/IPDPSW.2012.84