Title :
Distributed reset
Author :
Arora, Ankh ; Gouda, Mohamed
Author_Institution :
Dept. of Comput., Ohio State Univ., Columbus, OH, USA
fDate :
9/1/1994 12:00:00 AM
Abstract :
A reset subsystem is designed that can be embedded in an arbitrary distributed system in order to allow the system processes to reset the system when necessary. Our design is layered, and comprises three main components: a leader election, a spanning tree construction, and a diffusing computation. Each of these components is self-stabilizing in the following sense: if the coordination between the up-processes in the system is ever lost (due to failures or repairs of processes and channels), then each component eventually reaches a state where coordination is regained. This capability makes our reset subsystem very robust: it can tolerate fail-stop failures and repairs of processes and channels, even when a reset is in progress
Keywords :
distributed processing; fault tolerant computing; system recovery; channel failures; channel repairs; diffusing computation; distributed reset subsystem; embedded system; fail-stop failure tolerance; fault tolerance; layered design; leader election; process failures; process repairs; reliability; robustness; self-stabilizing components; spanning tree construction; up-process coordination; Communication channels; Computer science; Distributed computing; Fault tolerance; Nominations and elections; Process design; Robustness; Signal processing;
Journal_Title :
Computers, IEEE Transactions on