Title :
Temporal memoization for energy-efficient timing error recovery in GPGPUs
Author :
Rahimi, Azar ; Benini, Luca ; Gupta, R.K.
Author_Institution :
CSE, UC San Diego, San Diego, CA, USA
Abstract :
Manufacturing and environmental variability lead to timing errors in computing systems that are typically corrected by error detection and correction mechanisms at the circuit level. The cost and speed of recovery can be improved by memoization-based optimization methods that exploit spatial or temporal parallelisms in suitable computing fabrics such as general-purpose graphics processing units (GPGPUs). We propose here a temporal memoization technique for use in floating-point units (FPUs) in GPGPUs that uses value locality inside data-parallel programs. The technique recalls (memorizes) the context of error-free execution of an instruction on a FPU. To enable scalable and independent recovery, a single-cycle lookup table (LUT) is tightly coupled to every FPU to maintain contexts of recent error-free executions. The LUT reuses these memorized contexts to exactly, or approximately, correct errant FP instructions based on application needs. In real-world applications, the temporal memoization technique achieves an average energy saving of 8%-28% for a wide range of timing error rates (0%-4%) and outperforms recent advances in resilient architectures. This technique also enhances robustness in the voltage overscaling regime and achieves relative average energy saving of 66 % with 11% voltage overscaling.
Keywords :
error detection; fault tolerant computing; floating point arithmetic; graphics processing units; power aware computing; synchronisation; table lookup; FPUs; GPGPUs; LUT; circuit level; correction mechanisms; data-parallel programs; energy-efficient timing error recovery; error detection; error-free execution; floating-point units; general-purpose graphics processing units; memoization-based optimization methods; relative average energy saving; single-cycle lookup table; temporal memoization; timing error rates; voltage overscaling regime; Clocks; Error analysis; Pipelines; Registers; Table lookup; Timing; GPGPU; Variability; error recovery; temporal memoization; timing errors; value locality;
Conference_Titel :
Design, Automation and Test in Europe Conference and Exhibition (DATE), 2014
Conference_Location :
Dresden
DOI :
10.7873/DATE.2014.113