Title :
Efficient utilization of spare capacity for fault detection and location in multiprocessor systems
Author :
Tridandapani, S. ; Somani, A.K.
Author_Institution :
Washington Univ., Seattle, WA, USA
Abstract :
One scheme for detecting faults at the processor level in a multiprocessor system (see A. Dahbura et al., 1989) works by running secondary versions of jobs on the unused, or spare, processors of the system. The authors build upon this scheme and propose three new multiprocessor allocation strategies that run a viable number of versions per job. These schemes permit online detection and, in some cases, location of faulty processors in a system, without degrading its delay/throughput performance. Two new metrics, the fault detection capability and the fault location capability, are introduced to evaluate these schemes. Extensive simulation results are provided to show that these schemes utilize spare capacity more efficiently, thereby improving upon the fault detection and location capabilities of the system.<>
Keywords :
computer testing; fault location; fault tolerant computing; multiprocessing systems; resource allocation; fault detection; fault detection capability; fault location; fault location capability; multiprocessor allocation strategies; multiprocessor system; multiprocessor systems; spare capacity; Acceleration; Computational modeling; Computer science; Degradation; Electrical fault detection; Fault detection; Fault location; Hardware; Multiprocessing systems; Throughput;
Conference_Titel :
Fault-Tolerant Computing, 1992. FTCS-22. Digest of Papers., Twenty-Second International Symposium on
Conference_Location :
Boston, MA, USA
Print_ISBN :
0-8186-2875-8
DOI :
10.1109/FTCS.1992.243591