Title :
Implementation of recoverable distributed shared memory by logging writes
Author :
Kanthadai, Sundar ; Welch, Jennifer L.
Author_Institution :
Dept. of Comput. Sci., Texas A&M Univ., College Station, TX, USA
Abstract :
Distributed shared memory, by avoiding the programming complexities of message passing, has become a convenient model to work with. But the benefits given by these systems can possibly be achieved only if the whole system behaves like a failure-free system. Many algorithms that have been proposed for implementing a reliable DSM require the processes to take check points whenever there is a data transfer, thus resulting in a heavy overhead during failure-free execution. We present an algorithm to provide recoverable DSM for sequential consistency where the checkpoint interval can be tailored to balance the cost of checkpointing versus the savings in recovery obtained by taking check points often. Unlike previous recovery techniques that use logging, both the logging and the message overheads are reduced. It can tolerate up to n faults, where n is the number of processes, and can be used in an environment where the cost of synchronizing the checkpoints is substantially high
Keywords :
distributed memory systems; fault tolerant computing; message passing; shared memory systems; system recovery; DSM; checkpointing; data transfer; distributed shared memory; logging writes; recoverable; sequential consistency; Bandwidth; Checkpointing; Computer science; Costs; Fault tolerant systems; Message passing; Parallel programming; Power engineering and energy;
Conference_Titel :
Distributed Computing Systems, 1996., Proceedings of the 16th International Conference on
Print_ISBN :
0-8186-7399-0
DOI :
10.1109/ICDCS.1996.507908