Title :
Fault tolerant scheduling of precedence task graphs on heterogeneous platforms
Author :
Benoit, Anne ; Hakem, Mourad ; Robert, Yves
Author_Institution :
LIP Lab. UMR 5668, ENS Lyon - CNRS - INRIA - UCBL, Lyon
Abstract :
Fault tolerance and latency are important requirements in several applications which are time critical in nature: such applications require guaranties in terms of latency, even when processors are subject to failures. In this paper, we propose a fault tolerant scheduling heuristic for mapping precedence task graphs on heterogeneous systems. Our approach is based on an active replication scheme, capable of supporting epsiv arbitrary fail-silent (fail-stop) processor failures, hence valid results will be provided even if epsiv processors fail. We focus on a bi-criteria approach, where we aim at minimizing the latency given a fixed number of failures supported in the system, or the other way round. Major achievements include a low complexity, and a drastic reduction of the number of additional communications induced by the replication mechanism. Experimental results demonstrate that our heuristics, despite their lower complexity, outperform their direct competitor, the FTBAR scheduling algorithm [3].
Keywords :
failure analysis; scheduling; software fault tolerance; active replication scheme; arbitrary fail-silent processor failures; fault tolerant scheduling; heterogeneous platforms; precedence task graphs; Computer applications; Concurrent computing; Cost function; Delay; Distributed computing; Fault tolerance; Fault tolerant systems; Laboratories; Processor scheduling; Scheduling algorithm; fault tolerance; heterogeneous systems; multi-criteria scheduling; precedence task graphs; reliability; scheduling;
Conference_Titel :
Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on
Conference_Location :
Miami, FL
Print_ISBN :
978-1-4244-1693-6
Electronic_ISBN :
1530-2075
DOI :
10.1109/IPDPS.2008.4536133