DocumentCode
2032424
Title
Design of Algorithm-Based Fault Tolerant Systems with In-System Checks
Author
Yajnik, Shalini ; Jha, Niraj K.
Author_Institution
Princeton University
Volume
1
fYear
1993
fDate
16-20 Aug. 1993
Firstpage
246
Lastpage
253
Abstract
To improve the reliability of computeintensive applications run on multiprocessor architec tures, fault tolerance is introduced into the system with on-line detection and location of faults. This can be achieved by a low-cost scheme, called Algorithm-based fault tolerance (ABFT), which encodes data at the system level and modifies the algorithm to operate on the encoded data. The resultant encoded output data is checked for correctness by some checks. In this pa per we present an extended model for representing and designing ABFT systems. The model takes into con sideration the processors evaluating the checks. We propose a design method which considers the proces sors computing the checks to be a part of the ABFT system and guarantees concurrent error detection even in the presence of faults in these processors, unlike most methods presented earlier.
Keywords
Algorithm design and analysis; Computer applications; Concurrent computing; Design methodology; Electrical fault detection; Fault detection; Fault tolerant systems; Multiprocessing systems; Parallel processing; Process design;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel Processing, 1993. ICPP 1993. International Conference on
Conference_Location
Syracuse, NY, USA
ISSN
0190-3918
Print_ISBN
0-8493-8983-6
Type
conf
DOI
10.1109/ICPP.1993.70
Filename
4134148
Link To Document