Title :
On probabilistic diagnosis of multiprocessor systems using multiple syndromes
Author :
Lee, Sunggu ; Shin, Kang G.
Author_Institution :
Dept. of Electron. & Electr. Eng., Pohang Univ. of Sci. & Technol., South Korea
fDate :
6/1/1994 12:00:00 AM
Abstract :
This paper addresses the distributed self-diagnosis of a multiprocessor/multicomputer system based on fault syndromes formed by comparison testing. The authors show that by using multiple fault syndromes, it is possible to achieve significantly better diagnosis than by using a single fault syndrome, even when the amount of time devoted to testing is the same. They derive a multiple syndrome diagnosis algorithm that in terms of the level of diagnostic accuracy achieved, is globally suboptimal, but optimal among all diagnosis algorithms of a certain type to be defined. The diagnosis algorithm produces good results, even with sparse interconnection networks and interprocessor tests with low fault coverage. It is also proven that the diagnosis algorithm produces 100% correct diagnosis as N, the number of nodes in the system, approaches ∞, provided that the interconnection network has connectivity greater than or equal to 2 and that the number of syndromes produced grows faster than log N. This solution and another multiple syndrome diagnosis solution by Fussell and Rangarajan (1989) are comparatively evaluated, both analytically and with simulations
Keywords :
fault tolerant computing; multiprocessing systems; performance evaluation; probability; comparison testing; diagnosis algorithms; diagnostic accuracy; distributed self-diagnosis; fault-tolerant computing; intermittent fault; interprocessor tests; low fault coverage; multicomputer; multiple syndromes; multiprocessor; multiprocessor systems; probabilistic diagnosis; self-test; sparse interconnection networks; system-level diagnosis; Analytical models; Automatic testing; Built-in self-test; Circuit faults; Circuit testing; Fault diagnosis; Multiprocessing systems; Multiprocessor interconnection networks; System testing; Upper bound;
Journal_Title :
Parallel and Distributed Systems, IEEE Transactions on