DocumentCode
953940
Title
Pluribus—An operational fault-tolerant multiprocessor
Author
Katsuki, David ; Elsam, Eric S. ; Mann, William F. ; Roberts, Eric S. ; Robinson, John G. ; Skowronski, F. Stanley ; Wolf, Eric W.
Author_Institution
Bolt Beranek and Newman Inc., Cambridge, MA
Volume
66
Issue
10
fYear
1978
Firstpage
1146
Lastpage
1159
Abstract
The authors describe the Pluribus multiprocessor system, outline several techniques used to achieve fault-tolerance, describe their field experience to date, and mention some potential applications. The Pluribus system places the major responsibility for recovery from failures on the software. Failing hardware modules are removed from the system, spare modules are substituted where available, and appropriate initialization is performed. In applications where the goal is maximum availability rather than totally fault-free operation, this approach represents a considerable savings in complexity and cost over traditional implementations. The software-based reliability approach has been extended to provide enror-handling and recovery mechanisms for the system software structures as well. A number of Pluribus systems have been built and are currently in operation. Experience with these systems has given us confidence in their performance and maintainability, and leads us to suggest other applications that might benefit from this approach.
Keywords
ARPANET; Error correction; Fasteners; Fault tolerance; Fault tolerant systems; Hardware; Helium; Light emitting diodes; Microcomputers; Multiprocessing systems;
fLanguage
English
Journal_Title
Proceedings of the IEEE
Publisher
ieee
ISSN
0018-9219
Type
jour
DOI
10.1109/PROC.1978.11109
Filename
1455378
Link To Document