Title :
Failure detection algorithms for a reliable execution of parallel programs
Author :
Chabridon, Sophie ; Gelenbe, Erol
Author_Institution :
UFR de Math. et Inf., Univ. Rene Descartes, Paris, France
Abstract :
We report on the design and simulation of novel algorithms which will ensure that application software runs correctly on a MIMD system in which processing units (PU) can fail. The effect of these algorithms is evaluated for random task graphs using simulation as failure rates increase. An example of a specific application is also examined (the Fast Fourier Transform) for which we construct the task graph and then simulate its execution under various values of the failure rates of processors
Keywords :
fault tolerant computing; parallel processing; reliability; system recovery; MIMD system; failure detection algorithms; failure rates; parallel programs; random task graphs; reliable execution; task graph; Algorithm design and analysis; Application software; Computational modeling; Databases; Delay; Detection algorithms; Fast Fourier transforms; Parallel processing; Software algorithms; Surges;
Conference_Titel :
Reliable Distributed Systems, 1995. Proceedings., 14th Symposium on
Conference_Location :
Bad Neuenahr
Print_ISBN :
0-8186-7153-X
DOI :
10.1109/RELDIS.1995.526230