Title :
On checkpointing strategies in unreliable computing environments
Author_Institution :
C&F Search Marketing, Miami, FL, USA
Abstract :
In this paper, we analyze performance implications of checkpointing strategies in unreliable computing environments. We show that if the appropriate checkpointing strategy is not chosen, the time to complete a job is heavy-tailed distributed. This can lead to highly-variable and long completion times. We generate asymptotics for job completion times when there is no checkpointing, a fixed number of random checkpoints, and when checkpoints occur at fixed intervals for various task time distributions. Our asymptotic results are derived using large deviation theory.
Keywords :
checkpointing; reliability; ubiquitous computing; asymptotics; checkpointing strategies; deviation theory; fixed intervals; heavy tailed distributed system; job completion times; random checkpoints; task time distribution; unreliable computing environments; Checkpointing; Computational modeling; Equations; Markov processes; Mathematical model; Random variables; Tin; RESTART; asymptotics; checkpointing; failure; heavy-tail; large deviation theory; pri; recovery; unreliable systems;
Conference_Titel :
Intelligent Data Acquisition and Advanced Computing Systems (IDAACS), 2011 IEEE 6th International Conference on
Conference_Location :
Prague
Print_ISBN :
978-1-4577-1426-9
DOI :
10.1109/IDAACS.2011.6072739