Title :
Exploiting redundancy to speed up parallel systems
Author :
Yen, I. Ling ; Leiss, Ernst L. ; Bastani, Farokh B.
Author_Institution :
Dept. of Comput. Sci., Michigan State Univ., East Lansing, MI, USA
Abstract :
Repetitive fault tolerance takes advantage of redundant processors to offer peak performance during normal execution, and graceful performance degradation when processors fail. As long as one processor is working, the computation can continue. The authors use the underlying principle of inherent fault tolerance, turning redundancy into computation power, to design a model of repetitive fault tolerance that is suitable for dataflow computations. When no processors fail, they all work in parallel to achieve performance almost equal to that of the parallel program without fault tolerance. If processors do fail, the program can still derive the correct result as long as at least one processor is working; failures only slow the computation speed. Repetitive fault tolerance also provides a systematic way to derive fault-tolerant programs.<>
Keywords :
fault tolerant computing; parallel programming; redundancy; computation power; dataflow computations; fault tolerance; inherent fault tolerance; parallel program; peak performance; redundancy; redundant processors; repetitive fault tolerance; Algorithm design and analysis; Concurrent computing; Degradation; Fault tolerance; Fault tolerant systems; IEEE Computer Society Press; Matrix decomposition; Redundancy; Sorting;
Journal_Title :
Parallel & Distributed Technology: Systems & Applications, IEEE