مرکز منطقه ای اطلاع رساني علوم و فناوري - Designing reliable computer systems. The fault-tolerant approach

Abstract :

There are two strategies for increasing the reliability of a computer system. The first, called fault avoidance, is to try and remove the source of every possible fault that could give rise to an error condition. In practice this is not possible and the second strategy, called fault tolerance, is to incorporate various forms of protective redundancy in the form of additional hardware, additional software and time replication. This first part of a two-part tutorial guide to the fault-tolerant approach is a general survey of the many techniques for improving fault tolerance, and comments also on the design of self-checking logic circuits. The second part will look in some detail at one particular implementation (the JPL-Star computer) and concludes with a brief survey of other proposals and implementations