DocumentCode
1151444
Title
Distributed reset
Author
Arora, Ankh ; Gouda, Mohamed
Author_Institution
Dept. of Comput., Ohio State Univ., Columbus, OH, USA
Volume
43
Issue
9
fYear
1994
fDate
9/1/1994 12:00:00 AM
Firstpage
1026
Lastpage
1038
Abstract
A reset subsystem is designed that can be embedded in an arbitrary distributed system in order to allow the system processes to reset the system when necessary. Our design is layered, and comprises three main components: a leader election, a spanning tree construction, and a diffusing computation. Each of these components is self-stabilizing in the following sense: if the coordination between the up-processes in the system is ever lost (due to failures or repairs of processes and channels), then each component eventually reaches a state where coordination is regained. This capability makes our reset subsystem very robust: it can tolerate fail-stop failures and repairs of processes and channels, even when a reset is in progress
Keywords
distributed processing; fault tolerant computing; system recovery; channel failures; channel repairs; diffusing computation; distributed reset subsystem; embedded system; fail-stop failure tolerance; fault tolerance; layered design; leader election; process failures; process repairs; reliability; robustness; self-stabilizing components; spanning tree construction; up-process coordination; Communication channels; Computer science; Distributed computing; Fault tolerance; Nominations and elections; Process design; Robustness; Signal processing;
fLanguage
English
Journal_Title
Computers, IEEE Transactions on
Publisher
ieee
ISSN
0018-9340
Type
jour
DOI
10.1109/12.312126
Filename
312126
Link To Document