• DocumentCode
    953940
  • Title

    Pluribus—An operational fault-tolerant multiprocessor

  • Author

    Katsuki, David ; Elsam, Eric S. ; Mann, William F. ; Roberts, Eric S. ; Robinson, John G. ; Skowronski, F. Stanley ; Wolf, Eric W.

  • Author_Institution
    Bolt Beranek and Newman Inc., Cambridge, MA
  • Volume
    66
  • Issue
    10
  • fYear
    1978
  • Firstpage
    1146
  • Lastpage
    1159
  • Abstract
    The authors describe the Pluribus multiprocessor system, outline several techniques used to achieve fault-tolerance, describe their field experience to date, and mention some potential applications. The Pluribus system places the major responsibility for recovery from failures on the software. Failing hardware modules are removed from the system, spare modules are substituted where available, and appropriate initialization is performed. In applications where the goal is maximum availability rather than totally fault-free operation, this approach represents a considerable savings in complexity and cost over traditional implementations. The software-based reliability approach has been extended to provide enror-handling and recovery mechanisms for the system software structures as well. A number of Pluribus systems have been built and are currently in operation. Experience with these systems has given us confidence in their performance and maintainability, and leads us to suggest other applications that might benefit from this approach.
  • Keywords
    ARPANET; Error correction; Fasteners; Fault tolerance; Fault tolerant systems; Hardware; Helium; Light emitting diodes; Microcomputers; Multiprocessing systems;
  • fLanguage
    English
  • Journal_Title
    Proceedings of the IEEE
  • Publisher
    ieee
  • ISSN
    0018-9219
  • Type

    jour

  • DOI
    10.1109/PROC.1978.11109
  • Filename
    1455378