Title :
Evaluating distributed checkpointing protocols
Author :
Agbaria, Adnan ; Freund, Ari ; Friedman, Roy
Author_Institution :
Coordinated Sci. Lab., Illinois Univ., Urbana, IL, USA
Abstract :
This paper presents an objective measure, called overhead ratio, for evaluating distributed checkpointing protocols. This measure extends previous evaluation schemes by incorporating several additional parameters that are inherent in distributed environments. In particular, we take into account the rollback propagation of the protocol, which impacts the length of the recovery process, and therefore the expected program run-time in executions that involve failures and recoveries. The paper also analyzes several known protocols and compares their overhead ratio.
Keywords :
distributed processing; protocols; system recovery; distributed checkpointing protocol evaluation; distributed environment; overhead ratio; Application software; Checkpointing; Computer science; Coordinate measuring machines; Costs; Debugging; Distributed computing; Fault tolerant systems; Protocols; Runtime;
Conference_Titel :
Distributed Computing Systems, 2003. Proceedings. 23rd International Conference on
Print_ISBN :
0-7695-1920-2
DOI :
10.1109/ICDCS.2003.1203475