DocumentCode :
1783393
Title :
F-SEFI: A Fine-Grained Soft Error Fault Injection Tool for Profiling Application Vulnerability
Author :
Qiang Guan ; DeBardeleben, Nathan ; Blanchard, Sean ; Song Fu
Author_Institution :
Ultrascale Syst. Res. Center, Los Alamos Nat. Lab., Los Alamos, NM, USA
fYear :
2014
fDate :
19-23 May 2014
Firstpage :
1245
Lastpage :
1254
Abstract :
As the high performance computing (HPC) community continues to push towards exascale computing, resilience remains a serious challenge. With the expected decrease of both feature size and operating voltage, we expect a significant increase in hardware soft errors. HPC applications of today are only affected by soft errors to a small degree but we expect that this will become a more serious issue as HPC systems grow. We propose F-SEFI, a Fine-grained Soft Error Fault Injector, as a tool for profiling software robustness against soft errors. In this paper we utilize soft error injection to mimic the impact of errors on logic circuit behavior. Leveraging the open source virtual machine hypervisor QEMU, F-SEFI enables users to modify emulated machine instructions to introduce soft errors. F-SEFI can control what application, which sub-function, when and how to inject soft errors with different granularities, without interference to other applications that share the same environment. F-SEFI does this without requiring revisions to the application source code, compilers or operating systems. We discuss the design constraints for F-SEFI and the specifics of our implementation. We demonstrate use cases of F-SEFI on several benchmark applications to show how data corruption can propagate to incorrect results.
Keywords :
fault tolerant computing; operating systems (computers); parallel processing; program compilers; public domain software; virtual machines; F-SEFI; HPC applications; QEMU; application source code; compilers; data corruption; design constraints; exascale computing; feature size; fine-grained soft error fault injection tool; high performance computing community; open source virtual machine hypervisor; operating systems; operating voltage; profiling application vulnerability; soft errors; software robustness profiling; Benchmark testing; Circuit faults; Hardware; Probes; Registers; Virtual machine monitors; Virtual machining; High Performance Computing; fault injection; resilience; soft error; vulnerability;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Processing Symposium, 2014 IEEE 28th International
Conference_Location :
Phoenix, AZ
ISSN :
1530-2075
Print_ISBN :
978-1-4799-3799-8
Type :
conf
DOI :
10.1109/IPDPS.2014.128
Filename :
6877352
Link To Document :
بازگشت