Title :
Accurate microarchitecture-level fault modeling for studying hardware faults
Author :
Li, Man-Lap ; Ramachandran, Pradeep ; Karpuzcu, Ulya R. ; Hari, Siva Kumar Sastry ; Adve, Sarita V.
Author_Institution :
Dept. of Comput. Sci., Univ. of Illinois at Urbana-Champaign, Champaign, IL
Abstract :
Decreasing hardware reliability is expected to impede the exploitation of increasing integration projected by Moore´s Law. There is much ongoing research on efficient fault tolerance mechanisms across all levels of the system stack, from the device level to the system level. High-level fault tolerance solutions, such as at the microarchitecture and system levels, are commonly evaluated using statistical fault injections with microarchitecture-level fault models. Since hardware faults actually manifest at a much lower level, it is unclear if such high level fault models are acceptably accurate. On the other hand, lower level models, such as at the gate level, may be more accurate, but their increased simulation times make it hard to track the system-level propagation of faults. Thus, an evaluation of high-level reliability solutions entails the classical tradeoff between speed and accuracy. This paper seeks to quantify and alleviate this tradeoff. We make the following contributions: (1) We introduce SWAT-Sim, a novel fault injection infrastructure that uses hierarchical simulation to study the system-level manifestations of permanent (and transient) gate-level faults. For our experiments, SWAT-Sim incurs a small average performance overhead of under 3x, for the components we simulate, when compared to pure microarchitectural simulations. (2) We study system-level manifestations of faults injected under different microarchitecture-level and gate-level fault models and identify the reasons for the inability of microarchitecture-level faults to model gate-level faults in general. (3) Based on our analysis, we derive two probabilistic microarchitecture-level fault models to mimic gate-level stuck-at and delay faults. Our results show that these models are, in general, inaccurate as they do not capture the complex manifestation of gate-level faults. The inaccuracies in existing models and the lack of more accurate microarchitecture-level models motivate using infrastruc- - tures similar to SWAT-Sim to faithfully model the microarchitecture-level effects of gate-level faults.
Keywords :
fault simulation; fault tolerant computing; SWAT-Sim; fault injection infrastructure; gate-level fault model; hardware faults; hardware reliability; hierarchical simulation; microarchitectural simulations; microarchitecture-level fault modeling; performance overhead; permanent gate-level faults; probabilistic microarchitecture-level fault model; system-level manifestation; transient gate-level faults; Analytical models; Circuit faults; Electrical fault detection; Fault detection; Fault tolerant systems; Hardware; Latches; Logic arrays; Microarchitecture; Redundancy;
Conference_Titel :
High Performance Computer Architecture, 2009. HPCA 2009. IEEE 15th International Symposium on
Conference_Location :
Raleigh, NC
Print_ISBN :
978-1-4244-2932-5
DOI :
10.1109/HPCA.2009.4798242