Title :
Design of multiprocessor systems for concurrent error detection and fault diagnosis
Author :
Vinnakota, B. ; Jha, N.K.
Author_Institution :
Dept. of Electr. Eng., Princeton Univ., NJ, USA
Abstract :
Results on the design of systems using algorithm-based fault tolerance (ABFT), a low-overhead fault tolerance scheme for high-speed parallel processing systems, are presented. Bounds on the diagnosability of the system and the number of checks needed to design a unit system of given capability are derived. A procedure for forming the target fault-tolerant system from the unit system is introduced. The procedure is applicable to a wide range of systems in which processors may share data elements. The applications of the design scheme are illustrated through examples.<>
Keywords :
error detection; fault location; fault tolerant computing; multiprocessing systems; parallel processing; algorithm-based fault tolerance; concurrent error detection; design scheme; diagnosability; fault diagnosis; low-overhead fault tolerance scheme; multiprocessor systems; parallel processing systems; target fault-tolerant system; unit system; Concurrent computing; Electrical fault detection; Fault detection; Fault diagnosis; Fault tolerance; Fault tolerant systems; Multiprocessing systems; Parallel architectures; Parallel processing; Upper bound;
Conference_Titel :
Fault-Tolerant Computing, 1991. FTCS-21. Digest of Papers., Twenty-First International Symposium
Conference_Location :
Montreal, Quebec, Canada
Print_ISBN :
0-8186-2150-8
DOI :
10.1109/FTCS.1991.146708