• DocumentCode
    2032424
  • Title

    Design of Algorithm-Based Fault Tolerant Systems with In-System Checks

  • Author

    Yajnik, Shalini ; Jha, Niraj K.

  • Author_Institution
    Princeton University
  • Volume
    1
  • fYear
    1993
  • fDate
    16-20 Aug. 1993
  • Firstpage
    246
  • Lastpage
    253
  • Abstract
    To improve the reliability of computeintensive applications run on multiprocessor architec tures, fault tolerance is introduced into the system with on-line detection and location of faults. This can be achieved by a low-cost scheme, called Algorithm-based fault tolerance (ABFT), which encodes data at the system level and modifies the algorithm to operate on the encoded data. The resultant encoded output data is checked for correctness by some checks. In this pa per we present an extended model for representing and designing ABFT systems. The model takes into con sideration the processors evaluating the checks. We propose a design method which considers the proces sors computing the checks to be a part of the ABFT system and guarantees concurrent error detection even in the presence of faults in these processors, unlike most methods presented earlier.
  • Keywords
    Algorithm design and analysis; Computer applications; Concurrent computing; Design methodology; Electrical fault detection; Fault detection; Fault tolerant systems; Multiprocessing systems; Parallel processing; Process design;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Processing, 1993. ICPP 1993. International Conference on
  • Conference_Location
    Syracuse, NY, USA
  • ISSN
    0190-3918
  • Print_ISBN
    0-8493-8983-6
  • Type

    conf

  • DOI
    10.1109/ICPP.1993.70
  • Filename
    4134148