DocumentCode
2379348
Title
On staggered checkpointing
Author
Vaidya, Nitin H.
Author_Institution
Dept. of Comput. Sci., Texas A&M Univ., College Station, TX, USA
fYear
1996
fDate
23-26 Oct 1996
Firstpage
572
Lastpage
580
Abstract
A consistent checkpointing algorithm serves a consistent view of a distributed application´s state on stable storage. The traditional consistent checkpointing algorithms require different processes to save their state at about the same time. This causes contention for the stable storage, potentially resulting in large overheads. Staggering the checkpoints taken by various processes can reduce checkpoint overhead. The paper presents a simple approach to arbitrarily stagger the checkpoints. The approach requires that the processes take consistent logical checkpoints, as compared to consistent physical checkpoints enforced by existing algorithms. Experimental results on nCube-2 are presented
Keywords
distributed algorithms; distributed memory systems; fault tolerant computing; hypercube networks; reliability; system recovery; checkpoint overhead reduction; consistent checkpointing algorithm; consistent logical checkpoints; distributed application state; nCube-2; stable storage; staggered checkpointing; Checkpointing; Communication system control; Computer science; Degradation; Delay; Frequency; Upper bound;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel and Distributed Processing, 1996., Eighth IEEE Symposium on
Conference_Location
New Orleans, LA
Print_ISBN
0-8186-7683-3
Type
conf
DOI
10.1109/SPDP.1996.570386
Filename
570386
Link To Document