Title :
Optimistic recovery in multi-threaded distributed systems
Author :
Damani, Om P. ; Tarafdar, Ashis ; Garg, Vijay K.
Author_Institution :
Dept. of Comput. Sci., Texas Univ., Austin, TX, USA
Abstract :
The problem of recovering distributed systems from crash failures has been widely studied in the context of traditional non-threaded processes. However, extending those solutions to the multi-threaded scenario presents new problems. We identify and address these problems for optimistic logging protocols. There are two natural extension to optimistic logging protocols in the multi-threaded scenario. The first extension is process-centric, where the points of internal non-determinism caused by threads are logged. The second extension is thread-centric, where each thread is treated as a separate process. The process-centric approach suffers from false causality while the thread-centric approach suffers from high causality tracking overhead. By observing that the granularity of failures can be different from the granularity of rollbacks, we design a new balanced approach which incurs low causality tracking overhead and also eliminates false causality
Keywords :
multi-threading; system recovery; crash failures; distributed systems; multi-threaded; optimistic logging protocols; process-centric; recovering distributed systems; thread-centric; Checkpointing; Computer crashes; Concurrent computing; Electronic switching systems; Protocols; Read only memory; Yarn;
Conference_Titel :
Reliable Distributed Systems, 1999. Proceedings of the 18th IEEE Symposium on
Conference_Location :
Lausanne
Print_ISBN :
0-7695-0290-3
DOI :
10.1109/RELDIS.1999.805099