DocumentCode :
1783233
Title :
cuBLASTP: Fine-Grained Parallelization of Protein Sequence Search on a GPU
Author :
Jing Zhang ; Hao Wang ; Heshan Lin ; Wu-Chun Feng
fYear :
2014
fDate :
19-23 May 2014
Firstpage :
251
Lastpage :
260
Abstract :
BLAST, short for Basic Local Alignment Search Tool, is a fundamental algorithm in the life sciences that compares biological sequences. However, with the advent of next-generation sequencing (NGS) and increase in sequence read-lengths, whether at the outset or downstream from NGS, the exponential growth of sequence databases is arguably outstripping our ability to analyze the data. Though several recent studies have utilized the graphics processing unit (GPU) to speedup the BLAST algorithm for searching protein sequences (i.e., BLASTP), these studies used coarse-grained parallel approaches, where one sequence alignment is mapped to only one thread. Moreover, due to the irregular memory access patterns in BLASTP, there remain significant challenges to map the most time-consuming phases (i.e., hit detection and ungapped extension) to the GPU using a fine-grained multithreaded approach. To address the above issues, we propose cuBLASTP, an efficient fine-grained BLASTP implementation for the GPU using CUDA. Our cuBLASTP realization encompasses many research contributions, including (1) memory-access reordering to reorder hits from column-major order to diagonal-major order, (2) position-based indexing to map a hit with a packed data structure to a bin, (3) aggressive hit filtering to eliminate hits beyond the threshold distance along the diagonal, (4) diagonal-based parallelism and hit-based parallelism for ungapped extension to extend sequences with different lengths in databases, and (5) hierarchical buffering to reduce memory-access overhead for the core data structures. The experimental results show that on a NVIDIA Kepler GPU, cuBLASTP delivers up to a 5.0-fold speedup over sequential FSA-BLAST and a 3.7-fold speedup over multithreaded NCBI-BLAST for the overall program execution. In addition, compared with GPU-BLASTP (the fastest GPU implementation of BLASTP to date), cuBLASTP achieves up to a 2.8-fold speedup for the kernel execution on the GPU and a 1.8-fold sp- edup for the overall program execution.
Keywords :
bioinformatics; data analysis; data structures; graphics processing units; molecular biophysics; multi-threading; parallel architectures; proteins; storage management; BLAST algorithm; CUDA; GPU; NGS; NVIDIA Kepler GPU; aggressive hit filtering; basic local alignment search tool; bioinformatics; biological sequences; coarse-grained parallel approach; column-major order; core data structures; cuBLASTP; data analysis; diagonal-based parallelism; diagonal-major order; fine-grained BLASTP implementation; fine-grained multithreaded approach; fine-grained parallelization; graphics processing unit; hierarchical buffering; hit-based parallelism; irregular memory access patterns; life sciences; memory-access overhead reduction; memory-access reordering; next-generation sequencing; packed data structure; position-based indexing; program execution; protein sequence search; sequence alignment; sequence databases; sequence read-lengths; threshold distance; ungapped sequence extension; Acceleration; Algorithm design and analysis; Data structures; Databases; Graphics processing units; Instruction sets; BLASTP; GPU; bioinformatics; hit detection; life sciences; next-generation sequencing; ungapped extension;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Processing Symposium, 2014 IEEE 28th International
Conference_Location :
Phoenix, AZ
ISSN :
1530-2075
Print_ISBN :
978-1-4799-3799-8
Type :
conf
DOI :
10.1109/IPDPS.2014.36
Filename :
6877260
Link To Document :
بازگشت