• DocumentCode
    358567
  • Title

    Algorithm-based fault tolerance for spaceborne computing: basis and implementations

  • Author

    Turmon, Michael ; Granat, Robert

  • Author_Institution
    Jet Propulsion Lab., California Inst. of Technol., Pasadena, CA, USA
  • Volume
    4
  • fYear
    2000
  • fDate
    2000
  • Firstpage
    411
  • Abstract
    We describe and test the mathematical background for using checksum methods to validate results returned by a numerical subroutine operating in a fault-prone environment that causes unpredictable errors in data. We can treat subroutines whose results satisfy a necessary condition of a linear form; the checksum tests compliance with this necessary condition. These checksum schemes are called algorithm-based fault tolerance (ABFT). We discuss the theory and practice of setting numerical tolerances to separate errors caused by a fault from those inherent in finite-precision numerical calculations. Two series of tests are described. The first tests the general effectiveness of the linear ABFT schemes we propose, and the second verifies the correct behavior of our parallel implementation of them. We find that under simulated fault conditions, it is possible to choose a fault detection scheme that for average case matrices can detect 99% of faults with no false alarms, and that for a “worst-case” matrix population can detect 80% of faults with no false alarms
  • Keywords
    aerospace computing; error correction; parallel algorithms; singular value decomposition; software fault tolerance; subroutines; ROC; SVD; algorithm-based fault tolerance; checksum methods; error propagation; fault detection scheme; fault-prone environment; numerical subroutine; numerical tolerances; parallel implementation; spaceborne computing; unpredictable data errors; Computational modeling; Computer architecture; Fault detection; Fault tolerance; Learning systems; Machine learning algorithms; Propulsion; Single event transient; System testing; Telescopes;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Aerospace Conference Proceedings, 2000 IEEE
  • Conference_Location
    Big Sky, MT
  • ISSN
    1095-323X
  • Print_ISBN
    0-7803-5846-5
  • Type

    conf

  • DOI
    10.1109/AERO.2000.878453
  • Filename
    878453