Title :
Which Verification for Soft Error Detection?
Author :
Leonardo Bautista-Gomez;Anne Benoit;Aur?lien ;Saurabh K. Raina;Yves Robert;Hongyang Sun
Author_Institution :
Argonne Nat. Lab., Argonne, IL, USA
Abstract :
Many methods are available to detect silent errors in high-performance computing (HPC) applications. Each comes with a given cost and recall (fraction of all errors that are actually detected). The main contribution of this paper is to characterize the optimal computational pattern for an application: which detector(s) to use, how many detectors of each type to use, together with the length of the work segment that precedes each of them. We conduct a comprehensive complexity analysis of this optimization problem, showing NP-completeness and designing an FPTAS (Fully Polynomial-Time Approximation Scheme). On the practical side, we provide a greedy algorithm whose performance is shown to be close to the optimal for a realistic set of evaluation scenarios.
Keywords :
"Detectors","Protocols","Checkpointing","Greedy algorithms","Interpolation","Time series analysis","Redundancy"
Conference_Titel :
High Performance Computing (HiPC), 2015 IEEE 22nd International Conference on
DOI :
10.1109/HiPC.2015.26