DocumentCode
3336319
Title
Reduced overhead logging for rollback recovery in distributed shared memory
Author
Suri, G. ; Jannsens, B. ; Fuchs, W.K.
Author_Institution
AT&T Bell Labs., Murray Hill, NJ, USA
fYear
1995
fDate
27-30 June 1995
Firstpage
279
Lastpage
288
Abstract
Rollback techniques that use message logging and deterministic replay can be used in parallel systems to recover a failed node without involving other nodes. Distributed shared memory (DSM) systems cannot directly apply message-passing logging techniques because they use inherently nondeterministic asynchronous communication. This paper presents new logging schemes that reduce the typically high overhead for logging in DSM. Our algorithm for sequentially consistent systems tracks rather than logs accesses to shared memory. In an extension of this method to lazy release consistency, the per-access overhead of tracking has been completely eliminated. Measurements with parallel applications show a significant reduction in failure-free overhead.<>
Keywords
data loggers; distributed memory systems; fault tolerant computing; shared memory systems; system recovery; deterministic replay; distributed shared memory; failed node recovery; failure-free overhead; lazy release consistency; message logging; nondeterministic asynchronous communication; overhead logging; parallel systems; per-access overhead; rollback recovery; sequentially consistent systems; shared memory access tracking; Asynchronous communication; Checkpointing; Computer crashes; Concurrent computing; Distributed computing; Distributed processing; Hardware; Laboratories; Message passing; NASA;
fLanguage
English
Publisher
ieee
Conference_Titel
Fault-Tolerant Computing, 1995. FTCS-25. Digest of Papers., Twenty-Fifth International Symposium on
Conference_Location
Pasadena, CA, USA
Print_ISBN
0-8186-7079-7
Type
conf
DOI
10.1109/FTCS.1995.466971
Filename
466971
Link To Document