Title :
Fast Epistasis Detection in Large-Scale GWAS for Intel Xeon Phi Clusters
Author :
Glenn R. Luecke;Nathan T. Weeks;Brandon M. Groth;Marina Kraeva;Li Ma;Luke M. Kramer;James E. Koltes;James M. Reecy
Author_Institution :
Dept. of Math., Iowa State Univ., Ames, IA, USA
Abstract :
epiSNP is a program for identifying pairwise single nucleotide polymorphism (SNP) interactions (epistasis) that affect quantitative traits in genome-wide association studies (GWAS). A parallel MPI version (EPISNPmpi) was created in 2008 to address this computationally-expensive analysis on data sets with many quantitative traits and markers. However, the explosion in genome sequencing will lead to the creation of large-scale data sets that will overwhelm EPISNPmpi´s ability to compute results in a reasonable amount of time. Thus, epiSNP was rewritten to efficiently handle these large data sets. This was accomplished by performing serial optimizations, improving MPI load balancing, and introducing parallel OpenMP directives to further enhance load balancing and allow execution on the Intel Xeon Phi coprocessor (MIC). These additions resulted in new scalable versions of epiSNP using MPI, MPI+OpenMP, and MPI+OpenMP with one or two MICs. For a large 774,660 SNP data set with 1,634 individuals, the runtime on 126 nodes of TACC´s Stampede Supercomputer was 10.61 minutes without MICs, and 5.13 minutes with 2 MICs. This translated to speedups over EPISNPmpi of 17X without MICs, and 36X with 2 MICs.
Keywords :
"Arrays","Optimization","Algorithms","Load management","Microwave integrated circuits","Animals","Genomics"
Conference_Titel :
Trustcom/BigDataSE/ISPA, 2015 IEEE
DOI :
10.1109/Trustcom.2015.637