DocumentCode :
3503237
Title :
Evaluating distributed checkpointing protocols
Author :
Agbaria, Adnan ; Freund, Ari ; Friedman, Roy
Author_Institution :
Coordinated Sci. Lab., Illinois Univ., Urbana, IL, USA
fYear :
2003
fDate :
19-22 May 2003
Firstpage :
266
Lastpage :
273
Abstract :
This paper presents an objective measure, called overhead ratio, for evaluating distributed checkpointing protocols. This measure extends previous evaluation schemes by incorporating several additional parameters that are inherent in distributed environments. In particular, we take into account the rollback propagation of the protocol, which impacts the length of the recovery process, and therefore the expected program run-time in executions that involve failures and recoveries. The paper also analyzes several known protocols and compares their overhead ratio.
Keywords :
distributed processing; protocols; system recovery; distributed checkpointing protocol evaluation; distributed environment; overhead ratio; Application software; Checkpointing; Computer science; Coordinate measuring machines; Costs; Debugging; Distributed computing; Fault tolerant systems; Protocols; Runtime;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Distributed Computing Systems, 2003. Proceedings. 23rd International Conference on
ISSN :
1063-6927
Print_ISBN :
0-7695-1920-2
Type :
conf
DOI :
10.1109/ICDCS.2003.1203475
Filename :
1203475
Link To Document :
بازگشت