DocumentCode :
1504534
Title :
Coordinating Computation and I/O in Massively Parallel Sequence Search
Author :
Lin, Heshan ; Ma, Xiaosong ; Feng, Wuchun ; Samatova, Nagiza F.
Author_Institution :
Dept. of Comput. Sci., Virginia Tech, Blacksburg, VA, USA
Volume :
22
Issue :
4
fYear :
2011
fDate :
4/1/2011 12:00:00 AM
Firstpage :
529
Lastpage :
543
Abstract :
With the explosive growth of genomic information, the searching of sequence databases has emerged as one of the most computation and data-intensive scientific applications. Our previous studies suggested that parallel genomic sequence-search possesses highly irregular computation and I/O patterns. Effectively addressing these runtime irregularities is thus the key to designing scalable sequence-search tools on massively parallel computers. While the computation scheduling for irregular scientific applications and the optimization of noncontiguous file accesses have been well-studied independently, little attention has been paid to the interplay between the two. In this paper, we systematically investigate the computation and I/O scheduling for data-intensive, irregular scientific applications within the context of genomic sequence search. Our study reveals that the lack of coordination between computation scheduling and I/O optimization could result in severe performance issues. We then propose an integrated scheduling approach that effectively improves sequence-search throughput by gracefully coordinating the dynamic load balancing of computation and high-performance noncontiguous I/O.
Keywords :
biology computing; file organisation; genetics; input-output programs; parallel processing; resource allocation; scheduling; I/O optimization; I/O patterns; I/O scheduling; computation scheduling; data intensive scientific application; dynamic load balancing; genomic information; integrated scheduling; irregular computation; irregular scientific applications; massively parallel computers; massively parallel sequence search; noncontiguous file access optimization; parallel genomic sequence search; performance issues; runtime irregularities; scalable sequence-search tools; sequence database; BLAST.; Scheduling; bioinformatics; parallel I/O; parallel genomic sequence search;
fLanguage :
English
Journal_Title :
Parallel and Distributed Systems, IEEE Transactions on
Publisher :
ieee
ISSN :
1045-9219
Type :
jour
DOI :
10.1109/TPDS.2010.101
Filename :
5473216
Link To Document :
بازگشت