DocumentCode :
652279
Title :
Saving Time in a Program Robustness Evaluation
Author :
Gramacho, Joao ; Rexachs, Dolores ; Luque, Emilio
Author_Institution :
Comput. Archit. & Oper. Syst. Dept., Univ. Autonoma de Barcelona, Barcelona, Spain
fYear :
2013
fDate :
16-18 July 2013
Firstpage :
1274
Lastpage :
1282
Abstract :
The risk of having a program execution corrupted by transient faults is growing as computer processors are using more transistors, are becoming denser and are operating at lower voltages. This risk is multiplied when we take into account High Performance Computing with its hundreds or thousands of processors working together to solve a single problem. To evaluate how program executions behave in presence of transient faults we have proposed the concept of robustness against transient faults. This concept can be used to determine the more significant parts of a program with respect to the risk of misbehavior by transient faults for further study of improvement. The robustness concept can also be used as a metric to compare different approaches applied to a program to make it less likely of producing corrupted results. In this work we present why and how is possible to simplify a fraction of a program´s robustness by taking into account the repetition of sequences of instructions. The simplified analysis obtains the exact same result as a full program robustness evaluation (exhaustively and without estimations). By simplifying the analysis we were able to reduce in up to 192 times our previously published robustness analysis time and also were able to evaluate larger programs in feasible time (unimaginable by using executions in a fault injection capable environment).
Keywords :
microprocessor chips; computer processors; high performance computing; instruction sequences; program execution; program robustness evaluation; Absorption; Compression algorithms; Computer architecture; Program processors; Registers; Robustness; Transient analysis; Transient faults; reliability; robustness; simplification; soft errors;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Trust, Security and Privacy in Computing and Communications (TrustCom), 2013 12th IEEE International Conference on
Conference_Location :
Melbourne, VIC
Type :
conf
DOI :
10.1109/TrustCom.2013.237
Filename :
6680974
Link To Document :
بازگشت