Author_Institution :
Dept. of Comput. Sci., California Univ., Los Angeles, CA, USA
Abstract :
Circuitry added to fault-tolerant systems for concurrent error detection usually reduces performance. Using a technique called micro rollback, it is possible to eliminate most of the performance penalty of concurrent error detection. Error detection is performed in parallel with intermodule communication, and erroneous state changes are later undone. The author reports on the design and implementation of a VLSI RISC microprocessor, called the Mirror Processor (MP), which is capable of micro rollback. In order to achieve concurrent error detection, two MP chips operate in lockstep, comparing external signals and a signature of internal signals every clock cycle. If a mismatch is detected, both processors roll back to the beginning of the cycle when the error occurred. In some cases the erroneous state is corrected by copying a value from the fault-free processor to the faulty processor. The architecture, microarchitecture, and VLSI implementation of the MP, emphasizing its error-detection, error-recovery, and self-diagnosis capabilities are described
Keywords :
CMOS integrated circuits; VLSI; computer architecture; fault tolerant computing; microprocessor chips; reduced instruction set computing; Mirror Processor; VLSI RISC microprocessor; chips operate in lockstep; comparing external signals; concurrent error detection; design; error-detection; error-recovery; fault-tolerant systems; implementation; micro rollback; performance penalty minimisation; self-checking modules; self-diagnosis; self-repair; signature of internal signals; Circuit faults; Clocks; Computer errors; Electrical fault detection; Fault detection; Fault tolerant systems; Microprocessors; Mirrors; Reduced instruction set computing; Very large scale integration;