DocumentCode :
734987
Title :
Efficient on-line fault-tolerance for the preconditioned conjugate gradient method
Author :
Scholl, Alexander ; Braun, Claus ; Kochte, Michael A. ; Wunderlich, Hans-Joachim
Author_Institution :
Inst. of Comput. Archit. & Comput. Eng., Univ. of Stuttgart, Stuttgart, Germany
fYear :
2015
fDate :
6-8 July 2015
Firstpage :
95
Lastpage :
100
Abstract :
Linear system solvers are key components of many scientific applications and they can benefit significantly from modern heterogeneous computer architectures. However, such nano-scaled CMOS devices face an increasing number of reliability threats, which make the integration of fault tolerance mandatory. The preconditioned conjugate gradient method (PCG) is a very popular solver since it typically finds solutions faster than direct methods, and it is less vulnerable to transient effects. However, as latest research shows, the vulnerability is still considerable. Even single errors caused, for instance, by marginal hardware, harsh operating conditions or particle radiation can increase execution times considerably or corrupt solutions without indication. In this work, a novel and highly efficient fault-tolerant PCG method is presented. The method applies only two inner products to reliably detect errors. In case of errors, the method automatically selects between roll-back and efficient on-line correction. This significantly reduces the error detection overhead and expensive re-computations.
Keywords :
CMOS integrated circuits; conjugate gradient methods; fault simulation; nanoelectronics; radiation hardening (electronics); error detection overhead; execution times; fault tolerance mandatory; fault-tolerant PCG method; harsh operating conditions; linear system solvers; modern heterogeneous computer architectures; nanoscaled CMOS devices; on-line fault-tolerance; particle radiation; preconditioned conjugate gradient method; reliability threats; single errors; transient effects; Approximation algorithms; Approximation methods; Error correction; Fault tolerance; Fault tolerant systems; Gradient methods; Sparse matrices; ABFT; Fault Tolerance; Preconditioned Conjugate Gradient; Sparse Linear System Solving;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
On-Line Testing Symposium (IOLTS), 2015 IEEE 21st International
Conference_Location :
Halkidiki
Type :
conf
DOI :
10.1109/IOLTS.2015.7229839
Filename :
7229839
Link To Document :
بازگشت