Title :
Optimistic message logging for independent checkpointing in message-passing systems
Author :
Wang, Yi-Min ; Fuchs, W. Kent
Author_Institution :
Coordinated Sci. Lab., Illinois Univ., Urbana, IL, USA
Abstract :
Message-passing systems with a communication protocol transparent to the applications typically require message logging to ensure consistency between checkpoints. A periodic independent checkpointing scheme with optimistic logging to reduce performance degradation during normal execution while keeping the recovery cost acceptable is described. Both time and space overhead for message logging can be reduced by detecting messages that need not be logged. A checkpoint space reclamation algorithm is presented to reclaim all checkpoints which are not useful for any possible future recovery. Communication trace-driven simulation for several hypercube programs is used to evaluate the techniques
Keywords :
fault tolerant computing; hypercube networks; message passing; multiprocessing programs; protocols; checkpoint space reclamation algorithm; communication protocol; communication trace-driven simulation; hypercube programs; message-passing systems; optimistic message logging; performance degradation; recovery cost; Access protocols; Checkpointing; Costs; Degradation; Hypercubes; Message passing; NASA; Runtime; Software systems;
Conference_Titel :
Reliable Distributed Systems, 1992. Proceedings., 11th Symposium on
Conference_Location :
Houston, TX
Print_ISBN :
0-8186-2890-1
DOI :
10.1109/RELDIS.1992.235132