Title :
Data throttling for data-intensive workflows
Author :
Park, Sang-Min ; Humphrey, Marty
Author_Institution :
Dept. of Comput. Sci., Univ. of Virginia, Charlottesville, VA
Abstract :
Existing workflow systems attempt to achieve high performance by intelligently scheduling tasks on resources, sometimes even attempting to move the largest data files on the highest-capacity links. However, such approaches are inherently limited, in that there is only minimal control available regarding the arrival time and rate of data transfer between nodes, resulting in unbalanced workflows in which one task is idle while waiting for data to arrive. This paper describes a data throttling framework that can be exploited by workflow systems to uniquely regulate the rate of data transfers between the workflow tasks via a specially-created QoS-enabled GridFTP server. Our workflow planner constructs a schedule that both specifies when/where individual tasks are to be executed, as well as when and at what rate data is to be transferred. Simulation results involving a simple workflow indicate that our system can achieve a 30% speedup when nodes show a computation/communication ratio of approximately 0.5. We reinforce and confirm these results via the actual implementation of the Montage workflow in the wide area, obtaining a maximum speedup of 31% and an average speedup with 16%. Overall, we believe that our data throttling grid workflow system both executes workflows more efficiently (by better establishing balanced workflow graphs) and operates more cooperatively with unrelated concurrent grid activities by consuming less overall network bandwidth, allowing such unrelated activities to execute more efficiently as well.
Keywords :
concurrent engineering; data analysis; grid computing; scheduling; workflow management software; QoS-enabled GridFTP server; concurrent Grid activities; data throttling; data transfer; data-intensive workflows; scheduling; Astronomy; Bandwidth; Collaborative work; Computational modeling; Computer science; Engines; Logic; Physics; Processor scheduling; Space technology;
Conference_Titel :
Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on
Conference_Location :
Miami, FL
Print_ISBN :
978-1-4244-1693-6
Electronic_ISBN :
1530-2075
DOI :
10.1109/IPDPS.2008.4536306