Title :
A probabilistic method for fault diagnosis of multiprocessor systems
Author :
Rangarajan, S. ; Fussell, D.
Author_Institution :
Dept. of Comput. Sci., Texas Univ., Austin, TX, USA
Abstract :
The authors present a system-level fault-diagnosis algorithm for identifying faulty and fault-free units in a homogeneous system of computing elements. The algorithm is based on a comparison approach where tasks are performed by the units and their outputs are compared among themselves. Unlike other approaches, the authors´ algorithm requires no global syndrome analysis and therefore can be performed in real time as a background task during system operation. The time required to perform the diagnosis is constant regardless of the number of units in the system. Like previous global syndrome-based approaches, the accuracy of the algorithm is remarkably high, since it uses information about individual comparison results which is lost when these results are summarized in a global syndrome.<>
Keywords :
fault location; fault tolerant computing; multiprocessing systems; fault free processor identification; faulty processor identification; multiprocessor systems; probabilistic method; real time; system-level fault-diagnosis algorithm; Algorithm design and analysis; Costs; Fault detection; Fault diagnosis; Multiprocessing systems; Performance analysis; Performance evaluation; Real time systems; Redundancy; System testing;
Conference_Titel :
Fault-Tolerant Computing, 1988. FTCS-18, Digest of Papers., Eighteenth International Symposium on
Conference_Location :
Tokyo, Japan
Print_ISBN :
0-8186-0867-6
DOI :
10.1109/FTCS.1988.5332