Title :
Error detection by duplicated instructions in super-scalar processors
Author :
Oh, Nahmsuk ; Shirvani, Philip P. ; McCluskey, Edward J.
Author_Institution :
Dept. of Electr. Eng., Stanford Univ., CA, USA
fDate :
3/1/2002 12:00:00 AM
Abstract :
This paper proposes a pure software technique "error detection by duplicated instructions" (EDDI), for detecting errors during usual system operation. Compared to other error-detection techniques that use hardware redundancy, EDDI does not require any hardware modifications to add error detection capability to the original system. EDDI duplicates instructions during compilation and uses different registers and variables for the new instructions. Especially for the fault in the code segment of memory, formulas are derived to estimate the error-detection coverage of EDDI using probabilistic methods. These formulas use statistics of the program, which are collected during compilation. EDDI was applied to eight benchmark programs and the error-detection coverage was estimated. Then, the estimates were verified by simulation, in which a fault injector forced a bit-flip in the code segment of executable machine codes. The simulation results validated the estimated fault coverage and show that approximately 1.5% of injected faults produced incorrect results in eight benchmark programs with EDDI, while on average, 20% of injected faults produced undetected incorrect results in the programs without EDDI. Based on the theoretical estimates and actual fault-injection experiments, EDDI can provide over 98% fault-coverage without any extra hardware for error detection. This pure software technique is especially useful when designers cannot change the hardware, but they need dependability in the computer system. To reduce the performance overhead, EDDI schedules the instructions that are added for detecting errors such that "instruction-level parallelism" (ILP) is maximized. Performance overhead can be reduced by increasing ILP within a single super-scalar processor. The execution time overhead in a 4-way super-scalar processor is less than the execution time overhead in the processors that can issue two instructions in one cycle
Keywords :
error detection; instruction sets; parallel architectures; software fault tolerance; software reliability; concurrent error detection; error detection by duplicated instructions; error-detection coverage; error-detection coverage estimation; execution time overhead; fault tolerance; fault-coverage; instruction-level parallelism; instruction-scheduling; instructions duplication; memory code segment fault; performance overhead reduction; probabilistic methods; registers; single event upset; software technique; super-scalar processors; system operation; transient fault; Computer architecture; Computer errors; Error correction; Event detection; Fault detection; Hardware; Parallel processing; Redundancy; Registers; Single event upset;
Journal_Title :
Reliability, IEEE Transactions on