Title :
Adaptive checkpointing in dynamic grids for uncertain job durations
Author :
Chtepen, Maria ; Dhoedt, Bart ; De Turck, Filip ; Demeester, Piet ; Claeys, Filip H A ; Vanrolleghem, Peter A.
Author_Institution :
INTEC-IBBT, Ghent Univ., Ghent, Belgium
Abstract :
Adaptive checkpointing is a relatively new approach that is particularly suitable for providing fault-tolerance in dynamic and unstable grid environments. The approach allows for periodic modification of checkpointing intervals at run-time, when additional information becomes available. In this paper an adaptive algorithm, named MeanFailureCP+, is introduced that deals with checkpointing of grid applications with execution times that are unknown a priori. The algorithm modifies its parameters, based on dynamically collected feedback on its performance. Simulation results show that the new algorithm performs even better than adaptive approaches that make use of exact information on job execution times.
Keywords :
grid computing; software fault tolerance; MeanFailureCP+; adaptive algorithm; adaptive checkpointing; fault-tolerance; grid computing; uncertain job duration; Adaptive algorithm; Checkpointing; Computational modeling; Computer networks; Fault tolerance; Feedback; Grid computing; Job design; Resource management; Runtime; Grid computing; adaptive checkpointing; fault-tolerance;
Conference_Titel :
Information Technology Interfaces, 2009. ITI '09. Proceedings of the ITI 2009 31st International Conference on
Conference_Location :
Dubrovnik
Print_ISBN :
978-953-7138-15-8
DOI :
10.1109/ITI.2009.5196152