DocumentCode :
1269972
Title :
Effective and concurrent checkpointing and recovery in distributed systems
Author :
Hou, C.J. ; Tsoi, K.S. ; Han, C.C.
Author_Institution :
Dept. of Electr. Eng., Ohio State Univ., Columbus, OH, USA
Volume :
144
Issue :
5
fYear :
1997
fDate :
9/1/1997 12:00:00 AM
Firstpage :
304
Lastpage :
316
Abstract :
The paper presents an effective application-transparent checkpointing/rollback scheme for multiple processes that communicate via message passing in a distributed system. The authors first propose a checkpointing scheme that uses the unforced checkpointing strategy and dynamically varies checkpoint intervals with respect to the frequency of message sending to reduce process rollback propagation. Additional forced checkpoints are taken only to achieve checkpoint consistency among processes and to avoid the domino effect. The authors then discuss both global rollback and minimal rollback approaches, and incorporate them into the proposed checkpointing scheme. The combined checkpointing/rollback scheme can handle out-of-order messages, achieve high concurrency during checkpointing/rollback operations, and allow multiple invocations of checkpointing/rollback instances. To reduce the space overhead a global recovery line determination approach to purge the checkpoints to which processes shall never is proposed. Experiences with event driven simulation indicate that the proposed scheme can effectively reduce rollback propagation, while incurring little control message overhead and maintaining at any time only a few checkpoints at each process
Keywords :
concurrency control; message passing; system recovery; application-transparent checkpointing; concurrent checkpointing; control message overhead; distributed systems; domino effect; event driven simulation; global rollback; message passing; minimal rollback; out-of-order messages; process rollback propagation; recovery; rollback scheme;
fLanguage :
English
Journal_Title :
Computers and Digital Techniques, IEE Proceedings -
Publisher :
iet
ISSN :
1350-2387
Type :
jour
DOI :
10.1049/ip-cdt:19971527
Filename :
627909
Link To Document :
بازگشت