Author_Institution :
Dept. of Electr. Eng., Bradley Univ., Peoria, IL, USA
Abstract :
Two common dependability measures of fault-tolerant systems, availability and reliability, are defined, and the design of a reliable system is described. The first step is the incoming inspection of components to remove weak or defective components prior to assembly into subsystems. The next step is to use fault-avoidance techniques. Examples are careful signal routing, shielding, cabinet grounding, and inline static filters to increase the signal-to-noise ratio, limiting the fanout of gates to a small number to reduce the dissipation of power, and minimization of human errors through such measures as labeling and documentation. Another technique for increasing reliability is the use of a fault-tolerant design. Three approaches are examined: (1) fault-detection techniques (duplication, coding techniques, the check sum method, watchdog timers and timeouts, and consistency and capability checking); (2) fault-masking techniques (triple modular redundancy and error-correcting codes); (3) dynamic redundancy techniques. System diagnosis and the application of fault-tolerant systems are also discussed.<>
Keywords :
encoding; error correction codes; fault tolerant computing; minimisation; availability; cabinet grounding; capability checking; check sum method; coding; consistency; dependability measures; documentation; duplication; dynamic redundancy techniques; error-correcting codes; fault-avoidance techniques; fault-detection techniques; fault-masking techniques; fault-tolerant systems; human errors; inline static filters; inspection; labeling; minimization; reliability; shielding; signal routing; signal-to-noise ratio; timeouts; triple modular redundancy; watchdog timers; Assembly; Availability; Digital systems; Fault tolerant systems; Filters; Grounding; Inspection; Redundancy; Routing; Signal to noise ratio;