Title :
Exploiting Data Representation for Fault Tolerance
Author :
Elliott, James ; Hoemmen, Mark ; Mueller, Frank
Author_Institution :
Comput. Sci. Dept., North Carolina State Univ., Raleigh, NC, USA
Abstract :
We explore the link between data representation and soft errors in dot products. We present an analytic model for the absolute error introduced should a soft error corrupt a bit in an IEEE-754 floating-point number. We show how this finding relates to the fundamental linear algebra concepts of normalization and matrix equilibration. We present a case study illustrating that the probability of experiencing a large error in a dot product is minimized when both vectors are normalized. Furthermore, when data is normalized we show that the absolute error is less than one or very large, which allows us to detect large errors. We demonstrate how this finding can be used by instrumenting the GMRES iterative solver. We count all possible errors that can be introduced through faults in arithmetic in the computationally intensive orthogonalization phase, and show that when scaling is used the absolute error can be bounded above by one.
Keywords :
data structures; fault tolerance; floating point arithmetic; iterative methods; matrix algebra; probability; GMRES iterative solver; IEEE-754 floating-point number; analytic model; data representation; dot products; fault tolerance; generalized minimum residual method; linear algebra concepts; matrix equilibration; normalization; orthogonalization phase; probability; soft errors; Algorithm design and analysis; Analytical models; Computational modeling; Data models; Reliability; Transient analysis; Vectors;
Conference_Titel :
Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA), 2014 5th Workshop on
Conference_Location :
New Orleans, LA
DOI :
10.1109/ScalA.2014.5