Title :
Error detection mechanisms for massively parallel multiprocessors
Author :
Cin, M. Dal ; Hohl, W. ; Michel, E. ; Pataricza, A.
Author_Institution :
Math. Inst., Erlangen-Nurnberg Univ., Germany
Abstract :
A survey on the most important methods for error detection in multiprocessor systems is presented. A detailed comparison between watchdog processor and master-checker based fault tolerance is given. The fault coverage, hardware and run-time overhead are discussed, based on the experiences gained in the development of the MEMSY fault-tolerant multiprocessor system. The cumulative effects resulting from the simultaneous use of different hardware-near and high level fault-tolerance mechanisms are shown
Keywords :
error detection; fault tolerant computing; parallel machines; MEMSY fault-tolerant multiprocessor system; error detection mechanisms; fault coverage; hardware; massively parallel multiprocessors; master-checker based fault tolerance; run-time overhead; watchdog processor based fault tolerance; Application software; Computer architecture; Concurrent computing; Delay; Fault detection; Fault tolerance; Hardware; Multiprocessing systems; Redundancy; Testing;
Conference_Titel :
Parallel and Distributed Processing, 1993. Proceedings. Euromicro Workshop on
Conference_Location :
Gran Canaria
Print_ISBN :
0-8186-3610-6
DOI :
10.1109/EMPDP.1993.336378