DocumentCode :
1536058
Title :
Quasi-synchronous checkpointing: Models, characterization, and classification
Author :
Manivannan, D. ; Singhal, Mukesh
Author_Institution :
Dept. of Comput. Sci., Kentucky Univ., Lexington, KY, USA
Volume :
10
Issue :
7
fYear :
1999
fDate :
7/1/1999 12:00:00 AM
Firstpage :
703
Lastpage :
713
Abstract :
Checkpointing algorithms are classified as synchronous and asynchronous in the literature. In synchronous checkpointing, processes synchronize their checkpointing activities so that a globally consistent set of checkpoints is always maintained in the system. Synchronizing checkpointing activity involves message overhead and process execution may have to be suspended during the checkpointing coordination, resulting in performance degradation. In asynchronous checkpointing, processes take checkpoints without any coordination with others. Asynchronous checkpointing provides maximum autonomy for processes to take checkpoints; however, some of the checkpoints taken may not lie on any consistent global checkpoint, thus making the checkpointing efforts useless. Asynchronous checkpointing algorithms in the literature can reduce the number of useless checkpoints by making processes take communication induced checkpoints besides asynchronous checkpoints. We call such algorithms quasi-synchronous. In this paper, we present a theoretical framework for characterizing and classifying such algorithms. The theory not only helps to classify and characterize the quasi-synchronous checkpointing algorithms, but also helps to analyze the properties and limitations of the algorithms belonging to each class. It also provides guidelines for designing and evaluating such algorithms
Keywords :
fault tolerant computing; system recovery; classification; maximum autonomy; message overhead; performance degradation; process execution; quasi-synchronous checkpointing; Algorithm design and analysis; Application software; Checkpointing; Computer industry; Condition monitoring; Degradation; Distributed computing; Fault tolerance; Guidelines; Software debugging;
fLanguage :
English
Journal_Title :
Parallel and Distributed Systems, IEEE Transactions on
Publisher :
ieee
ISSN :
1045-9219
Type :
jour
DOI :
10.1109/71.780865
Filename :
780865
Link To Document :
بازگشت