DocumentCode :
358567
Title :
Algorithm-based fault tolerance for spaceborne computing: basis and implementations
Author :
Turmon, Michael ; Granat, Robert
Author_Institution :
Jet Propulsion Lab., California Inst. of Technol., Pasadena, CA, USA
Volume :
4
fYear :
2000
fDate :
2000
Firstpage :
411
Abstract :
We describe and test the mathematical background for using checksum methods to validate results returned by a numerical subroutine operating in a fault-prone environment that causes unpredictable errors in data. We can treat subroutines whose results satisfy a necessary condition of a linear form; the checksum tests compliance with this necessary condition. These checksum schemes are called algorithm-based fault tolerance (ABFT). We discuss the theory and practice of setting numerical tolerances to separate errors caused by a fault from those inherent in finite-precision numerical calculations. Two series of tests are described. The first tests the general effectiveness of the linear ABFT schemes we propose, and the second verifies the correct behavior of our parallel implementation of them. We find that under simulated fault conditions, it is possible to choose a fault detection scheme that for average case matrices can detect 99% of faults with no false alarms, and that for a “worst-case” matrix population can detect 80% of faults with no false alarms
Keywords :
aerospace computing; error correction; parallel algorithms; singular value decomposition; software fault tolerance; subroutines; ROC; SVD; algorithm-based fault tolerance; checksum methods; error propagation; fault detection scheme; fault-prone environment; numerical subroutine; numerical tolerances; parallel implementation; spaceborne computing; unpredictable data errors; Computational modeling; Computer architecture; Fault detection; Fault tolerance; Learning systems; Machine learning algorithms; Propulsion; Single event transient; System testing; Telescopes;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Aerospace Conference Proceedings, 2000 IEEE
Conference_Location :
Big Sky, MT
ISSN :
1095-323X
Print_ISBN :
0-7803-5846-5
Type :
conf
DOI :
10.1109/AERO.2000.878453
Filename :
878453
Link To Document :
بازگشت