DocumentCode
2979044
Title
Fault-tolerance using cache-coherent distributed shared memory systems
Author
Hecht, D.L. ; Kavi, K.M. ; Gaede, R.K. ; Katsinis, C.
Author_Institution
Alabama Univ., Huntsville, AL, USA
fYear
1999
fDate
1999
Firstpage
100
Lastpage
105
Abstract
Describes new protocols augmenting traditional cache coherency mechanisms to implement fault tolerance based on recovery blocks and checkpointing. Concurrent processes compound rollback recovery since the rollback can potentially lead to a “domino effect” whereby the process is rolled back to the beginning. Several approaches have been proposed to limit the domino effect. One set of such techniques requires communicating processes to periodically synchronize in order to checkpoint a globally consistent state. These schemes can be implemented more naturally on distributed shared memory systems using synchronization on shared memory. We have developed extensions to well-known cache-coherency methods (e.g. directory-based) for the implementation of checkpointing consistent states
Keywords
cache storage; coherence; distributed shared memory systems; fault tolerant computing; memory protocols; synchronisation; system recovery; cache-coherent distributed shared memory systems; checkpointing; communicating process synchronization; concurrent processes; directory-based cache-coherency methods; domino effect; fault tolerance; globally consistent state; protocols; recovery blocks; rollback recovery; Decision support systems; Fault tolerant systems;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel Architectures, Algorithms, and Networks, 1999. (I-SPAN '99) Proceedings. Fourth InternationalSymposium on
Conference_Location
Perth/Fremantle, WA
ISSN
1087-4089
Print_ISBN
0-7695-0231-8
Type
conf
DOI
10.1109/ISPAN.1999.778924
Filename
778924
Link To Document