DocumentCode :
1448212
Title :
Adaptive system-level diagnosis for hypercube multiprocessors
Author :
Feng, Chao ; Bhuyan, Laxmi N. ; Lombardi, Fabrizio
Author_Institution :
Land Mobile Product Center, Motorola Inc., Schaumburg, IL, USA
Volume :
45
Issue :
10
fYear :
1996
fDate :
10/1/1996 12:00:00 AM
Firstpage :
1157
Lastpage :
1170
Abstract :
System-level diagnosis is an important technique for fault detection and location in multiprocessor computing systems. Efficient diagnosis is highly desirable for sustaining the original system power. Moreover, effective diagnosis is particularly important for a multiprocessor system with high scalability but low connectivity. Most of the existing results are not applicable in practice because of the high diagnosis cost and limited diagnosability. Over-d fault diagnosis, where d is the diagnosability, has only been addressed using a probabilistic method in the literature. Aiming at these two issues, we propose a hierarchical adaptive system-level diagnosis approach for hypercube systems using a divide-and-conquer strategy. We first propose a conceptual algorithm HADA to formulate a rigorous analysis. Then we present its practical variant IHADA. In HADA and IHADA, the over-d fault problem is inherently tackled through a deterministic method. Three measures for diagnosis cost (diagnosis time, number of tests, and number of test links) are analyzed for the proposed algorithms. It is proved that the diagnosis cost required by our approach is lower than in previous diagnosis algorithms. It is shown that the diagnosis cost for the proposed algorithms depends on the number and location of faulty units in the system and the cost is extremely low when only a small number of faulty units exist. It is also shown that our algorithms are characterized by lower costs than a pessimistic diagnosis algorithm which trades lower diagnosis cost for a lower degree of accuracy. Experimental results on the nCUBE are provided
Keywords :
fault diagnosis; hypercube networks; multiprocessing systems; HADA; diagnosability; divide-and-conquer; fault diagnosis; hierarchical adaptive; hypercube multiprocessors; hypercube systems; multiprocessor computing systems; system-level diagnosis; Adaptive systems; Algorithm design and analysis; Costs; Fault detection; Fault diagnosis; Hypercubes; Multiprocessing systems; Scalability; Testing; Time measurement;
fLanguage :
English
Journal_Title :
Computers, IEEE Transactions on
Publisher :
ieee
ISSN :
0018-9340
Type :
jour
DOI :
10.1109/12.543709
Filename :
543709
Link To Document :
بازگشت