Title :
Efficient and Fault-Tolerant Static Scheduling for Grids
Author :
Cichowski, Patrick ; Keller, James
Author_Institution :
Fac. of Math. & Comput. Sci., FernUniv. in Hagen, Hagen, Germany
Abstract :
Static task graphs model a variety of parallel applications, and are used to schedule such applications in grid platforms. While the scheduling is static, i.e. done prior to execution, processors might fail or not deliver their performance, especially if the grid comprises nodes with donated time, that may be used or shutdown by their owner at any time. We extend a prior proposal for fault-tolerant grid scheduling with task duplication to also cover situations where tasks take much longer than expected from the schedule as a special kind of fault. Furthermore, we consider the time for communication between dependent tasks when placing duplicates. We evaluate both scenarios with a simulator that injects faults and slowdowns to processors, and workloads from a benchmark suite of task graph with a variety of structures. Our results indicate that the overhead in the fault-free case is negligible, that a processor failure mostly increases the schedule make span only moderately because duplicates can use gapsin the original schedule, and that the effects of a processors lowdown can partly be mitigated by aborting a (slow) task and running its duplicate.
Keywords :
graph theory; grid computing; scheduling; software fault tolerance; system recovery; task analysis; benchmark suite; dependent tasks; efficient static scheduling; fault injection; fault-free case; fault-tolerant static scheduling; grid platforms; parallel applications; processor failure; static task graphs model; task duplication; Dynamic scheduling; Fault tolerance; Fault tolerant systems; Processor scheduling; Program processors; Runtime; Schedules; fault tolerance; grid computing; performance monitoring; static task scheduling;
Conference_Titel :
Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2013 IEEE 27th International
Conference_Location :
Cambridge, MA
Print_ISBN :
978-0-7695-4979-8
DOI :
10.1109/IPDPSW.2013.94