DocumentCode
1783393
Title
F-SEFI: A Fine-Grained Soft Error Fault Injection Tool for Profiling Application Vulnerability
Author
Qiang Guan ; DeBardeleben, Nathan ; Blanchard, Sean ; Song Fu
Author_Institution
Ultrascale Syst. Res. Center, Los Alamos Nat. Lab., Los Alamos, NM, USA
fYear
2014
fDate
19-23 May 2014
Firstpage
1245
Lastpage
1254
Abstract
As the high performance computing (HPC) community continues to push towards exascale computing, resilience remains a serious challenge. With the expected decrease of both feature size and operating voltage, we expect a significant increase in hardware soft errors. HPC applications of today are only affected by soft errors to a small degree but we expect that this will become a more serious issue as HPC systems grow. We propose F-SEFI, a Fine-grained Soft Error Fault Injector, as a tool for profiling software robustness against soft errors. In this paper we utilize soft error injection to mimic the impact of errors on logic circuit behavior. Leveraging the open source virtual machine hypervisor QEMU, F-SEFI enables users to modify emulated machine instructions to introduce soft errors. F-SEFI can control what application, which sub-function, when and how to inject soft errors with different granularities, without interference to other applications that share the same environment. F-SEFI does this without requiring revisions to the application source code, compilers or operating systems. We discuss the design constraints for F-SEFI and the specifics of our implementation. We demonstrate use cases of F-SEFI on several benchmark applications to show how data corruption can propagate to incorrect results.
Keywords
fault tolerant computing; operating systems (computers); parallel processing; program compilers; public domain software; virtual machines; F-SEFI; HPC applications; QEMU; application source code; compilers; data corruption; design constraints; exascale computing; feature size; fine-grained soft error fault injection tool; high performance computing community; open source virtual machine hypervisor; operating systems; operating voltage; profiling application vulnerability; soft errors; software robustness profiling; Benchmark testing; Circuit faults; Hardware; Probes; Registers; Virtual machine monitors; Virtual machining; High Performance Computing; fault injection; resilience; soft error; vulnerability;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel and Distributed Processing Symposium, 2014 IEEE 28th International
Conference_Location
Phoenix, AZ
ISSN
1530-2075
Print_ISBN
978-1-4799-3799-8
Type
conf
DOI
10.1109/IPDPS.2014.128
Filename
6877352
Link To Document