DocumentCode :
2724418
Title :
Intelligent Selection of Fault Tolerance Techniques on the Grid
Author :
Vanderster, Daniel C. ; Dimopoulos, Nikitas J. ; Sobie, Randall J.
Author_Institution :
Univ. of Victoria, Victoria
fYear :
2007
fDate :
10-13 Dec. 2007
Firstpage :
69
Lastpage :
76
Abstract :
The emergence of computational grids has lead to an increased reliance on task schedulers that can guarantee the completion of tasks that are executed on unreliable systems. There are three common techniques for providing task-level fault tolerance on a grid: retrying, replicating, and checkpointing. While these techniques are varyingly successful at providing resilience to faults, each of them presents a tradeoff between performance and resource cost. As such, tasks having unique urgency requirements would ideally be placed using one of the techniques; for example, urgent tasks are likely to prefer the replication technique, which guarantees timely completion, whereas low priority tasks should not incur any extra resource cost in the name of fault tolerance. This paper introduces a placement and selection strategy which, by computing the utility of each fault tolerance technique in relation to a given task, finds the set of allocation options which optimizes the global utility. Heuristics which take into account the value offered by a user, the estimated resource cost, and the estimated response time of an option are presented. Simulation results show that the resulting allocations have improved fault tolerance, runtime, profit, and allow users to prioritize their tasks.
Keywords :
grid computing; software fault tolerance; computational grids; intelligent selection; task-level fault tolerance; Checkpointing; Computational intelligence; Computational modeling; Costs; Delay; Fault tolerance; Grid computing; Processor scheduling; Resilience; Runtime;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
e-Science and Grid Computing, IEEE International Conference on
Conference_Location :
Bangalore
Print_ISBN :
978-0-7695-3064-2
Type :
conf
DOI :
10.1109/E-SCIENCE.2007.45
Filename :
4426873
Link To Document :
بازگشت