DocumentCode :
2175258
Title :
Fault-Tolerance in Dataflow-Based Scientific Workflow Management
Author :
Yildiz, Ustun ; Mouallem, Pierre ; Vouk, Mladen ; Crawl, Daniel ; Altintas, Ilkay
Author_Institution :
Univ. of California, Davis, CA, USA
fYear :
2010
fDate :
5-10 July 2010
Firstpage :
336
Lastpage :
343
Abstract :
This paper addresses the challenges of providing fault-tolerance in scientific workflow management. The specification and handling of faults in scientific workflows should be defined precisely in order to ensure the consistent execution against the process-specific requirements. We identified a number of typical failure patterns that occur in real-life scientific workflow executions. Following the intuitive recovery strategies that correspond to the identified patterns, we developed the methodologies that integrate recovery fragments into fault-prone scientific workflow models. Compared to the existing fault-tolerance mechanisms, the propositions reduce the effort of workflow designers by defining recovery fragments automatically. Furthermore, the developed framework implements the necessary mechanisms to capture the faults from the different layers of a scientific workflow management architecture. Experience indicates that the framework can be employed effectively to model, capture and tolerate the typical failure patterns that we identified.
Keywords :
fault tolerant computing; natural sciences computing; workflow management software; dataflow based scientific workflow management; fault tolerance; recovery fragments; Biological system modeling; Data models; Data structures; Fault tolerance; Fault tolerant systems; Monitoring; Pipelines; Scientific Workflow Patterns Kepler;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Services (SERVICES-1), 2010 6th World Congress on
Conference_Location :
Miami, FL
Print_ISBN :
978-1-4244-8199-6
Electronic_ISBN :
978-0-7695-4129-7
Type :
conf
DOI :
10.1109/SERVICES.2010.93
Filename :
5577254
Link To Document :
بازگشت