DocumentCode :
3516557
Title :
Low overhead Soft Error Mitigation techniques for high-performance and aggressive systems
Author :
Avirneni, Naga Durga Prasad ; Subramanian, Viswanathan ; Somani, Arun K.
Author_Institution :
Dependable Comput. & Networking Lab., Iowa State Univ., Ames, IA, USA
fYear :
2009
fDate :
June 29 2009-July 2 2009
Firstpage :
185
Lastpage :
194
Abstract :
The threat of soft error induced system failure in high performance computing systems has become more prominent, as we adopt ultra-deep submicron process technologies. In this paper, we propose two techniques, namely soft error mitigation (SEM) and soft and timing error mitigation (STEM), for protecting combinational logic blocks from soft errors. Our first technique (SEM), based on distributed and temporal voting of three registers, unloads the soft error detection overhead from the critical path of the systems. Our second technique (STEM) adds timing error detection capability to guarantee reliable execution in aggressively clocked designs that enhance system performance by operating beyond worst-case clock frequency. We also present a specialized low overhead clock generation scheme that ably supports our proposed techniques. Timing annotated gate level simulations, using 45 nm libraries, of a pipelined adder-multiplier and DLX processor show that both our techniques achieve near 100% fault coverage. For DLX processor, even under severe fault injection campaigns, SEM achieves an average performance improvement of 26.58% over a conventional triple modular redundancy voter based soft error mitigation scheme, while STEM outperforms SEM by 27.42%.
Keywords :
adders; clocks; combinational circuits; error detection; fault tolerance; logic design; microprocessor chips; multiplying circuits; transistor circuits; DLX processor; clock frequency; combinational logic block protection; distributed voting; fault tolerance; low overhead clock generation scheme; low overhead soft error mitigation technique; nanosized transistor; pipelined adder-multiplier; size 45 nm; soft and timing error mitigation techniques; soft error induced system failure; temporal voting; timing annotated gate level simulation; timing error detection capability; triple modular redundancy voter; ultra-deep submicron process technologies; Clocks; Frequency; High performance computing; Libraries; Logic; Protection; Redundancy; System performance; Timing; Voting; Dependable and Adaptive Systems; Overclocking; Parameter Variations; Soft Error;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Dependable Systems & Networks, 2009. DSN '09. IEEE/IFIP International Conference on
Conference_Location :
Lisbon
Print_ISBN :
978-1-4244-4422-9
Electronic_ISBN :
978-1-4244-4421-2
Type :
conf
DOI :
10.1109/DSN.2009.5270340
Filename :
5270340
Link To Document :
بازگشت