DocumentCode :
1146937
Title :
Distributed Reconfiguration Strategies for Fault-Tolerant Multiprocessor Systems
Author :
Clarke, Edmund M. ; Nikolaou, Christos N.
Author_Institution :
Center for Research in Computing Technology, Harvard University
Issue :
8
fYear :
1982
Firstpage :
771
Lastpage :
784
Abstract :
In this paper, we investigate strategies for dynamically reconfiguring shared memory multiprocessor systems that are subject to common memory faults and unpredictable processor deaths. These strategies aim at determining a communication page, i.e., a page of common memory that can be used by a group of processors for storing crucial common resources such as global locks for synchronization and global data structures for voting algorithms. To ensure system reliability, the reconfiguration strategies must be distributed so that each processor independently arrives at exactly the same choice. This type of reconfiguration strategy is currently used in the STAGE operating system on the PLURIBUS multiprocessor [5]. We analyze the weak points of the PLURIBUS algorithm and examine alternative strategies satisfying optimization criteria such as maximization of the number of processors and the number of common memory pages in the reconfigured system. We also present a general distributed algorithm which enables the processors in such a system to exchange the local information that is needed to reach a consensus on system reconfiguration.
Keywords :
Communication page; fault-tolerence; multiprocessor systems; reconfiguration strategies; Algorithm design and analysis; Application software; Data structures; Distributed algorithms; Fault tolerant systems; Multiprocessing systems; Operating systems; Real time systems; Reliability; Voting; Communication page; fault-tolerence; multiprocessor systems; reconfiguration strategies;
fLanguage :
English
Journal_Title :
Computers, IEEE Transactions on
Publisher :
ieee
ISSN :
0018-9340
Type :
jour
DOI :
10.1109/TC.1982.1676083
Filename :
1676083
Link To Document :
بازگشت