• DocumentCode
    776178
  • Title

    An algorithm-based error detection scheme for the multigrid method

  • Author

    Mishra, Amitabh ; Banerjee, Prithviraj

  • Author_Institution
    Bradley Dept. of Electr. & Comput. Eng., Virginia Polytech. Inst. & State Univ., Blacksburg, VA, USA
  • Volume
    52
  • Issue
    9
  • fYear
    2003
  • Firstpage
    1089
  • Lastpage
    1099
  • Abstract
    Algorithm-based fault tolerance (ABFT) is a technique to provide system level error detection and correction on array processors as well as multiprocessors at a low cost. Since the early 1980s the technique has been extensively applied to several linear algebraic algorithms, e.g., matrix multiplication, Gaussian elimination, QR factorization, and singular value decompositions, etc. An important class of problems in numerical linear algebra dealing with the iterative solution of linear algebraic equations arising due to the finite difference discretization or the finite element discretization of a partial differential equation, however, has been overlooked. The only exception is the recent application of algorithm based error detection (ABED) encodings to the successive overrelaxation algorithm for Laplace´s equation. In this paper, ABED is applied to a multigrid algorithm for the iterative solution of a Poisson equation in two dimensions. Invariants are created to implement checking in the relaxation, the restriction, and the interpolation operators. Modifications to invariants due to roundoff errors accumulated within the operators, which often lead to a situation known as false alarms, have been addressed by deriving the expressions for the roundoff errors in the algebraic processes in the operators and correcting the invariants accordingly. The ABED encoded multigrid algorithm is shown to be insensitive to the size and the range of the input data besides providing excellent error coverage at a low latency for floating-point, integer, and memory errors.
  • Keywords
    Poisson equation; fault tolerant computing; finite difference methods; finite element analysis; interpolation; iterative methods; mathematical operators; multiprocessing systems; parallel processing; relaxation theory; roundoff errors; system recovery; ABED; algorithm-based error detection; algorithm-based fault tolerance; array processors; error coverage; finite difference discretization; finite element discretization; floating-point errors; integer errors; interpolation operator; iterative solution; latency; linear algebraic equations; memory errors; multigrid method; multiprocessors; numerical linear algebra; partial differential equation; relaxation; restriction operator; roundoff errors; system level error detection; two dimensional Poisson equation; Costs; Error correction; Fault detection; Fault tolerant systems; Iterative algorithms; Laplace equations; Matrices; Multigrid methods; Partial differential equations; Roundoff errors;
  • fLanguage
    English
  • Journal_Title
    Computers, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9340
  • Type

    jour

  • DOI
    10.1109/TC.2003.1228507
  • Filename
    1228507