DocumentCode :
2125049
Title :
Towards analyzing and improving robustness of software applications to intermittent and permanent faults in hardware
Author :
Sharma, Ashok ; Sloan, Jeff ; Wanner, Lucas F. ; Elmalaki, Salma H. ; Srivastava, Mani B. ; Gupta, Puneet
Author_Institution :
Electr. Eng. Dept., UCLA, Los Angeles, CA, USA
fYear :
2013
fDate :
6-9 Oct. 2013
Firstpage :
435
Lastpage :
438
Abstract :
Although a significant fraction of emerging failure and wearout mechanisms result in intermittent or permanent faults in hardware, their impact (as distinct from transient faults) on software applications has not been well studied. In this paper, we develop a distinguishing application characteristic, referred to as similarity from fundamental circuit-level understanding of the failure mechanisms. We present a mathematical definition and a procedure for similarity computation for practical software applications and experimentally verify the relationship between similarity and fault rate. Leveraging dependence of application robustness on the similarity metric, we present example architecture independent code transformations to reduce similarity and thereby the worst-case fault rate with minimal performance degradation. Our experimental results with arithmetic unit faults show as much as 74% improvement in the worst case fault rate on benchmark kernels, with less than 10% runtime penalty.
Keywords :
digital arithmetic; program compilers; program diagnostics; software architecture; software fault tolerance; system recovery; architecture independent code transformation; arithmetic unit fault; benchmark kernels; circuit-level understanding; failure mechanism; intermittent hardware fault; mathematical definition; performance degradation; permanent hardware fault; robustness analysis; robustness improvement; runtime penalty; similarity computation; software application; transient fault; wearout mechanism; worst-case fault rate; Circuit faults; Computer architecture; Conferences; Delays; Hardware; Software; Vectors; Permanent fault; code transformation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Design (ICCD), 2013 IEEE 31st International Conference on
Conference_Location :
Asheville, NC
Type :
conf
DOI :
10.1109/ICCD.2013.6657076
Filename :
6657076
Link To Document :
بازگشت