Title :
Calculating Architectural Vulnerability Factors for Spatial Multi-Bit Transient Faults
Author :
Wilkening, Mark ; Sridharan, Vilas ; Si Li ; Previlon, Fritz ; Gurumurthi, Sudhanva ; Kaeli, David R.
Author_Institution :
ECE Dept., Northeastern Univ., Boston, MA, USA
Abstract :
Reliability is an important design constraint in modern microprocessors, and one of the fundamental reliability challenges is combating the effects of transient faults. This requires extensive analysis, including significant fault modelling allow architects to make informed reliability tradeoffs. Recent data shows that multi-bit transient faults are becoming more common, increasing from 0.5% of static random-access memory (SRAM) faults in 180nm to 3.9% in 22nm. Such faults are predicted to be even more prevalent in smaller technology nodes. Therefore, accurately modeling the effects of multi-bit transient faults is increasingly important to the microprocessor design process. Architecture vulnerability factor (AVF) analysis is a method to model the effects of single-bit transient faults. In this paper, we propose a method to calculate AVFs for spatial multibittransient faults (MB-AVFs) and provide insights that can help reduce the impact of these faults. First, we describe a novel multi-bit AVF analysis approach for detected uncorrected errors (DUEs) and show how to measure DUE MB-AVFs in a performance simulator. We then extend our approach to measure silent data corruption (SDC) MB-AVFs. We find that MB-AVFs are not derivable from single-bit AVFs. We also find that larger fault modes have higher MB-AVFs. Finally, we present a case study on using MB-AVF analysis to optimize processor design, yielding SDC reductions of 86% in a GPU vector register file.
Keywords :
SRAM chips; fault tolerant computing; graphics processing units; microprocessor chips; reliability; AVF analysis; DUE; GPU vector register file; SDC MB-AVF; SRAM faults; architectural vulnerability factors; detected uncorrected errors; fault modeling; microprocessors; multibit AVF analysis approach; reliability; silent data corruption MB-AVF; single-bit transient faults; spatial multibit transient faults; static random-access memory faults; Analytical models; Error analysis; Hardware; Life estimation; Random access memory; Reliability; Transient analysis; fault tolerance; reliability; soft errors;
Conference_Titel :
Microarchitecture (MICRO), 2014 47th Annual IEEE/ACM International Symposium on
Conference_Location :
Cambridge
DOI :
10.1109/MICRO.2014.15