Title :
Evaluating the Impact of SDC on the GMRES Iterative Solver
Author :
Elliott, James ; Hoemmen, Mark ; Mueller, Frank
Author_Institution :
Comput. Sci. Dept., North Carolina State Univ., Raleigh, NC, USA
Abstract :
Increasing parallelism and transistor density, along with increasingly tighter energy and peak power constraints, may force exposure of occasionally incorrect computation or storage to application codes. Silent data corruption (SDC) will likely be infrequent, yet one SDC suffices to make numerical algorithms like iterative linear solvers cease progress towards the correct answer. Thus, we focus on resilience of the iterative linear solver GMRES to a single transient SDC. We derive inexpensive checks to detect the effects of an SDC in GMRES that work for a more general SDC model than presuming a bit flip. Our experiments show that when GMRES is used as the inner solver of an inner-outer iteration, it can "run through" SDC of almost any magnitude in the computationally intensive orthogonalization phase. That is, it gets the right answer using faulty data without any required roll back. Those SDCs which it cannot run through, get caught by our detection scheme.
Keywords :
data handling; iterative methods; software fault tolerance; FT-GMRES algorithm; GMRES iterative linear solver; application codes; computationally intensive orthogonalization phase; energy power constraints; fault detection scheme; fault-tolerant GMRES algorithm; generalized minimal residual method; inner-outer iteration; numerical algorithms; peak power constraints; silent data corruption; single transient SDC; transistor density; Algorithm design and analysis; Computational modeling; Hardware; Iterative methods; Reliability; Transient analysis; Vectors; Fault Tolerance; Numerical Analysis; Silent Data Corruption;
Conference_Titel :
Parallel and Distributed Processing Symposium, 2014 IEEE 28th International
Conference_Location :
Phoenix, AZ
Print_ISBN :
978-1-4799-3799-8
DOI :
10.1109/IPDPS.2014.123