DocumentCode :
652274
Title :
Genome Assembly on a Multicore System
Author :
Biswas, Arijit ; Ranjan, Desh ; Zubair, Mohammad
Author_Institution :
Dept. of Comput. Sci., Old Dominion Univ., Norfolk, VA, USA
fYear :
2013
fDate :
16-18 July 2013
Firstpage :
1233
Lastpage :
1240
Abstract :
The genome assembly problem is to generate the original DNA sequence of an organism from a large set of short (400bp-500bp) overlapping fragments. The assembly problem is challenging particularly in presence of repeats, which are multiple identical or nearly identical stretches of DNA. MIRA is an open source assembler, which is widely used by biologist and works effectively in presence of repeats. However, it is computation intensive, for example an assembly of one million fragments requires about 18.3 hours. The computation in MIRA assembler is dominated by the contigs building phase, which is highly sequential in nature. In this paper, we propose a modification to MIRA assembler that allows this computation to be parallelized while maintaining the quality of the assembly. We implemented the modified MIRA assembler on a 64-core system with eight Intel(R) Xeon(R) X7560 processors. We were able to speedup the building contigs phase by a factor of 55 on the 64-core system. Additionally, we parallelized the other phases of the MIRA assembler and were able to reduce the total sequential execution time of assembly from 18.3 hours to 3.4 hours (speedup of 5.57) without sacrificing assembly quality. It is worth noting that the overall speedup is limited by Amdahl´s Law as parts of original MIRA assembler are inherently sequential. For example for one million reads the sequential portion of the MIRA assembler takes about 2.78 hours doing I/O or other operations which limits the overall speedup to 6.58.
Keywords :
DNA; biology computing; genomics; multiprocessing systems; parallel processing; public domain software; Amdahl´s law; I/O operations; Intel(R) Xeon(R) X7560 processors; MIRA open source assembler; assembly quality; contig building phase; genome assembly problem; multicore system; organism DNA sequence generate; parallelized computation; total sequential execution time reduction; Assembly; Bioinformatics; Buildings; Computational modeling; Error correction; Genomics; Image edge detection; Multicore parallelism; OLC Graph Model; OpenMP; Parallel Genome Assembly;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Trust, Security and Privacy in Computing and Communications (TrustCom), 2013 12th IEEE International Conference on
Conference_Location :
Melbourne, VIC
Type :
conf
DOI :
10.1109/TrustCom.2013.148
Filename :
6680969
Link To Document :
بازگشت