DocumentCode :
723697
Title :
merAligner: A Fully Parallel Sequence Aligner
Author :
Georganas, Evangelos ; Buluc, Aydin ; Chapman, Jarrod ; Oliker, Leonid ; Rokhsar, Daniel ; Yelick, Katherine
Author_Institution :
Comput. Res. Div., Lawrence Berkeley Nat. Lab., Berkeley, CA, USA
fYear :
2015
fDate :
25-29 May 2015
Firstpage :
561
Lastpage :
570
Abstract :
Aligning a set of query sequences to a set of target sequences is an important task in bioinformatics. In this work we present merAligner, a highly parallel sequence aligner that implements a seed -- and -- extend algorithm and employs parallelism in all of its components. MerAligner relies on a high performance distributed hash table (seed index) and uses one-sided communication capabilities of the Unified Parallel C to facilitate a fine-grained parallelism. We leverage communication optimizations at the construction of the distributed hash table and software caching schemes to reduce communication during the aligning phase. Additionally, merAligner preprocesses the target sequences to extract properties enabling exact sequence matching with minimal communication. Finally, we efficiently parallelize the I/O intensive phases and implement an effective load balancing scheme. Results show that merAligner exhibits efficient scaling up to thousands of cores on a Cray XC30 supercomputer using real human and wheat genome data while significantly outperforming existing parallel alignment tools.
Keywords :
C language; bioinformatics; cache storage; optimisation; parallel processing; resource allocation; Cray XC30 supercomputer; I/O intensive phases; aligning phase; bioinformatics; communication optimizations; communication reduction; fine-grained parallelism; high performance distributed hash table; load balancing scheme; merAligner; one-sided communication capabilities; parallel sequence aligner; query sequences; seed index; seed-and-extend algorithm; sequence matching; software caching schemes; unified parallel C; wheat genome data; Bioinformatics; Data structures; Genomics; Indexes; Load management; Optimization; Software;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International
Conference_Location :
Hyderabad
ISSN :
1530-2075
Type :
conf
DOI :
10.1109/IPDPS.2015.96
Filename :
7161544
Link To Document :
بازگشت