Title :
Towards Formal Approaches to System Resilience
Author :
Sharma, Vishal Chandra ; Haran, Avner ; Rakamaric, Zvonimir ; Gopalakrishnan, Ganesh
Author_Institution :
Sch. of Comput., Univ. of Utah, Salt Lake City, UT, USA
Abstract :
Technology scaling and techniques such as dynamic voltage/frequency scaling are predicted to increase the number of transient faults in future processors. Error detectors implemented in hardware are often energy inefficient, as they are "always on." While software-level error detection can augment hardware-level detectors, creating detectors in software that are highly effective remains a challenge. In this paper, we first present anew LLVM-level fault injector called KULFI that helps simulate faults occurring within CPU state elements in a versatile manner. Second, using KULFI, we study the behavior of a family of well-known and simple algorithms under error injection. (We choose a family of sorting algorithms for this study.) We then propose a promising way to interpret our empirical results using a formal model that builds on the idea of predicate state transition diagrams. After introducing the basic abstraction underlying our predicate transition diagrams, we draw connections to the level of resilience empirically observed during fault injection studies. Building on the observed connections, we develop a simple, and yet effective, predicate-abstraction-based fault detector. While in its initial stages, ours is believed to be the first study that offers a formal way to interpret and compare fault injection results obtained from algorithms from within one family. Given the absolutely unpredictable nature of what a fault can do to a computation in general, our approach may help designers choose amongst a class of algorithms one that behaves most resilient of all.
Keywords :
error detection; fault tolerant computing; formal specification; CPU state elements; KULFI; LLVM-level fault injector; dynamic frequency scaling; dynamic voltage scaling; error detectors; error injection; fault injection; formal approaches; formal model; hardware-level detectors; predicate state transition diagrams; predicate-abstraction-based fault detector; software-level error detection; system resilience; technology scaling; transient faults; Arrays; Circuit faults; Hardware; Registers; Resilience; Sorting; Transient analysis; Fault Tolerance; Formal Approach; System Resilience;
Conference_Titel :
Dependable Computing (PRDC), 2013 IEEE 19th Pacific Rim International Symposium on
Conference_Location :
Vancouver, BC
DOI :
10.1109/PRDC.2013.14