Title :
Fault recovery characteristics of the fault tolerant multiprocessor
Author :
Padilla, Peter A.
Author_Institution :
NASA Langley Res. Center, Hampton, VA, USA
Abstract :
The fault handling performance of the fault tolerant multiprocessor (FTMP) was investigated. Fault handling errors detected during fault injection experiments were characterized. In these fault injection experiments, the FTMP disabled a working unit instead of the faulted unit once every 500 faults, on the average. System design weaknesses allow active faults to exercise a part of the fault management software that handles byzantine or lying faults. It is pointed out that these weak areas in the FTMP´s design increase the probability that, for any hardware fault, a good LRU (line replaceable unit) is mistakenly disabled by the fault management software. It is concluded that fault injection can help detect and analyze the behavior of a system in the ultra-reliable regime. Although fault injection testing cannot be exhaustive, it has been demonstrated that it provides a unique capability to unmask problems and to characterize the behavior of a fault-tolerant system
Keywords :
fault tolerant computing; multiprocessing systems; FTMP; active faults; error latch processing; fault handling performance; fault injection experiments; fault management software; fault recovery characteristics; fault tolerant multiprocessor; hard faults; intermittent faults; line replaceable unit; system design; ultra-reliable regime; Aerospace electronics; Application software; Data acquisition; Fault detection; Fault tolerance; Fault tolerant systems; Laboratories; NASA; Performance evaluation; System testing;
Conference_Titel :
Digital Avionics Systems Conference, 1990. Proceedings., IEEE/AIAA/NASA 9th
Conference_Location :
Virginia Beach, VA
DOI :
10.1109/DASC.1990.111293