DocumentCode :
431068
Title :
CCUML: a checkpointing protocol for distributed system processes
Author :
Neogy, Sarmistha ; Sinha, Anupam ; Das, Pradip K.
Author_Institution :
Dept. of Comput. Sci. & Eng., Jadavpur Univ., Calcutta, India
Volume :
B
fYear :
2004
fDate :
21-24 Nov. 2004
Firstpage :
553
Abstract :
This paper presents a checkpointing protocol CCUML-coordinated checkpointing with unacknowledged message logging. A checkpoint initiator initiates taking of checkpoints at the end of each checkpoint interval. Processes take local checkpoints only after being notified by the initiator. However, there is no central initiator, but each process takes turn to act as the initiator at each checkpoint initiation. The guaranty that no message would be lost in case of failure, has been brought about by maintaining a log of unacknowledged messages along with the latest checkpoint in a process. Since only unacknowledged messages are logged, the overhead is negligible. Thus, the distributed checkpointing protocol described here always ensures a consistent set of checkpoints from which processes can resume during recovery after a fault.
Keywords :
checkpointing; fault tolerant computing; message passing; CCUML; checkpoint initiator; coordinated checkpointing-unacknowledged message logging; distributed system processes; Checkpointing; Protocols;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
TENCON 2004. 2004 IEEE Region 10 Conference
Print_ISBN :
0-7803-8560-8
Type :
conf
DOI :
10.1109/TENCON.2004.1414655
Filename :
1414655
Link To Document :
بازگشت