DocumentCode :
3740670
Title :
Parallel Read Error Correction for Big Genomic Datasets
Author :
Nagakishore Jammula;Sriram Chockalingam;Srinivas Aluru
Author_Institution :
Georgia Inst. of Technol., Atlanta, GA, USA
fYear :
2015
Firstpage :
446
Lastpage :
455
Abstract :
Genome sequencing, using instruments in vogue today, deciphers in the order of a billion short genomic fragments per run. These fragments are a few hundred bases long and are commonly referred to as `reads´. Reads contain errors due to limitations of sequencing technology. Read error correction enhances the quality of results produced by applications in areas such as genomics, metagenomics, and transcriptomics. Use of error corrected reads also improves the runtime and the memory usage of such applications. Sequential error correction tools cannot cope with the large number of reads produced by modern day sequencing instruments. A distributed-memory Parallel Spectrum-based Error Correction (PSbEC) algorithm was proposed to overcome this drawback [1]. In this work, we propose techniques to address three major shortcomings of the PSbEC algorithm. Our optimizations enhance the scope and the speedup of the PSbEC algorithm, thereby enabling error correction of big genomic datasets. More specifically, by combining our optimizations, we are able to achieve a cumulative speedup of up to 11 X. Further, we demonstrate error correction of a human dataset containing nearly 1.55 billion reads. This work stands as the first demonstration of distributed-memory genomic read error correction for a dataset consisting of more than a billion reads.
Keywords :
"Error correction","Genomics","Bioinformatics","Sequential analysis","Instruments","Optimization","Parallel algorithms"
Publisher :
ieee
Conference_Titel :
High Performance Computing (HiPC), 2015 IEEE 22nd International Conference on
Type :
conf
DOI :
10.1109/HiPC.2015.47
Filename :
7397660
Link To Document :
بازگشت