• DocumentCode
    2497401
  • Title

    Dynamic fault tolerance in DCMA-a dynamically configurable multicomputer architecture

  • Author

    Kuefner, H. ; Baehring, H.

  • Author_Institution
    Dept. of Comput. Sci., FernUniversitaet Hagen, Germany
  • fYear
    1996
  • fDate
    23-25 Oct 1996
  • Firstpage
    22
  • Lastpage
    31
  • Abstract
    This paper introduces a new architecture for a fault-tolerant computer system which connects high-end PCs or workstations by a high-speed network. To achieve platform independence, coupling is based on the widely used PCI-bus. In contrast to commercially available fault-tolerant systems we strongly emphasize mechanisms for tolerating transient and intermittent faults. To keep hardware costs low the system is built with off-the-shelf computers and their extensions are kept as small as possible. To reduce the operational costs the system can be dynamically adapted to different demands on fault tolerance on a program-by-program basis. Adaptation is done transparently to the application software by the operating system. We use a commercially available real-time operating system with a POSIX-compliant UNIX-interface. The bandwidth of fault tolerance reaches from a non-redundant system of stand-alone computers, a master/checker configuration to a TMR-system. The high-performance network allows the system to operate as a parallel multicomputer, too
  • Keywords
    fault tolerant computing; multiprocessor interconnection networks; parallel architectures; PCI-bus; POSIX-compliant UNIX-interface; application software; dynamic fault tolerance; dynamically configurable multicomputer architecture; fault-tolerant computer system; intermittent faults; real-time operating system; transient faults; Computer architecture; Computer networks; Costs; Fault tolerance; Fault tolerant systems; Hardware; High-speed networks; Operating systems; Personal communication networks; Workstations;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Reliable Distributed Systems, 1996. Proceedings., 15th Symposium on
  • Conference_Location
    Nigara-on-the-Lake, Ont.
  • ISSN
    1060-9857
  • Print_ISBN
    0-8186-7481-4
  • Type

    conf

  • DOI
    10.1109/RELDIS.1996.559691
  • Filename
    559691