• DocumentCode
    1159935
  • Title

    Design of algorithm-based fault-tolerant multiprocessor systems for concurrent error detection and fault diagnosis

  • Author

    Vinnakota, Bapiraju ; Jha, Niraj K.

  • Author_Institution
    Dept. of Electr. Eng., Minnesota Univ., Minneapolis, MN, USA
  • Volume
    5
  • Issue
    10
  • fYear
    1994
  • fDate
    10/1/1994 12:00:00 AM
  • Firstpage
    1099
  • Lastpage
    1106
  • Abstract
    Algorithm-based fault tolerance (ABPT) is a low-overhead system-level concurrent error detection and fault location scheme for multiprocessor systems. We present new methods for the design of ABFT systems. Our design procedure is applicable to a wide range of systems in which processors share data elements. A feature of our design approach is that the type of checks to be used in the final system can be controlled by the system designer. We also present some new bounds on the number of checks needed in ABFT system design
  • Keywords
    fault location; fault tolerant computing; multiprocessing systems; parallel architectures; reliability; system recovery; ABFT system design; ABFT systems; algorithm-based fault tolerance; algorithm-based multiprocessor systems; concurrent error detection; data element sharing; design procedure; fault diagnosis; fault location scheme; fault-tolerant multiprocessor systems; low-overhead system-level error detection; Algorithm design and analysis; Control systems; Design methodology; Fault detection; Fault diagnosis; Fault location; Fault tolerance; Fault tolerant systems; Multiprocessing systems; Signal processing algorithms;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/71.313125
  • Filename
    313125