Title :
A strategy for soft error reduction in multi core designs
Author :
Hyman, Ransford, Jr. ; Bhattacharya, Koustav ; Ranganathan, Nagarajan
Author_Institution :
Dept. of Comput. Sci. & Eng., Univ. of South Florida, Tampa, FL, USA
Abstract :
With the continuous decrease in the minimum feature size and increase in the chip density, modern processors are being increasingly susceptible to soft errors. In the past, the technique of lockstep execution with redundant threads on duplicated pipelines have been used for soft error rate reduction which can achieve high error coverage but at the cost of large overheads in terms of area and performance. In this paper, we propose techniques for protection against soft errors in multi-core designs using (i) the properties of spatial and temporal redundancy and (ii) value based detection. We utilize temporal redundancy by using the ldquolatency use slackrdquo (LSC) of an instruction, which we define as the number of cycles before the computed result from the instruction becomes the source operand of a subsequent instruction, while spatial redundancy is exploited by duplicating the instruction to a nearby idle processor core. Further, the value based detection technique is explored by exploiting the width of the operands with small data values and the generation of residue code check bits for the source operands. When a soft error is detected, error correction is achieved by rolling back the execution to a previous checkpoint state and re-executing the instructions. The proposed techniques have been implemented on the RSIM simulation framework and validated using the SPLASH benchmarks. Our results indicate that the soft error detection schemes proposed in this work, can be implemented, on average, with less than 10% increase in CPI on modern multi-core designs.
Keywords :
error correction; microprocessor chips; radiation hardening (electronics); redundancy; RSIM simulation; SPLASH benchmarks; duplicated pipelines; error correction; latency use slack; multicore designs; processor core; soft error rate reduction; spatial redundancy; temporal redundancy; value based detection; Computer errors; Costs; Error analysis; Error correction; Logic; Multicore processing; Pipelines; Protection; Redundancy; Yarn;
Conference_Titel :
Circuits and Systems, 2009. ISCAS 2009. IEEE International Symposium on
Conference_Location :
Taipei
Print_ISBN :
978-1-4244-3827-3
Electronic_ISBN :
978-1-4244-3828-0
DOI :
10.1109/ISCAS.2009.5118238