DocumentCode
1536051
Title
Staggered consistent checkpointing
Author
Vaidya, Nitin H.
Author_Institution
Dept. of Comput. Sci., Texas A&M Univ., College Station, TX, USA
Volume
10
Issue
7
fYear
1999
fDate
7/1/1999 12:00:00 AM
Firstpage
694
Lastpage
702
Abstract
A consistent checkpointing algorithm saves a consistent view of a distributed application´s state on stable storage. The traditional consistent checkpointing algorithms require different processes to save their state at about the same time. This causes contention for the stable storage, potentially resulting in large overheads. Staggering the checkpoints taken by various processes can reduce checkpoint overhead. This paper presents a simple approach to arbitrarily stagger the checkpoints. Our approach requires that the processes take consistent logical checkpoints, as compared to consistent physical checkpoints enforced by existing algorithms. Experimental results on nCube-2 are presented
Keywords
fault tolerant computing; parallel architectures; system recovery; consistent checkpointing staggering; logical checkpoints; nCube-2; stable storage; Checkpointing; Communication system control; Degradation; Fault tolerance; Frequency; Writing;
fLanguage
English
Journal_Title
Parallel and Distributed Systems, IEEE Transactions on
Publisher
ieee
ISSN
1045-9219
Type
jour
DOI
10.1109/71.780864
Filename
780864
Link To Document