DocumentCode
358567
Title
Algorithm-based fault tolerance for spaceborne computing: basis and implementations
Author
Turmon, Michael ; Granat, Robert
Author_Institution
Jet Propulsion Lab., California Inst. of Technol., Pasadena, CA, USA
Volume
4
fYear
2000
fDate
2000
Firstpage
411
Abstract
We describe and test the mathematical background for using checksum methods to validate results returned by a numerical subroutine operating in a fault-prone environment that causes unpredictable errors in data. We can treat subroutines whose results satisfy a necessary condition of a linear form; the checksum tests compliance with this necessary condition. These checksum schemes are called algorithm-based fault tolerance (ABFT). We discuss the theory and practice of setting numerical tolerances to separate errors caused by a fault from those inherent in finite-precision numerical calculations. Two series of tests are described. The first tests the general effectiveness of the linear ABFT schemes we propose, and the second verifies the correct behavior of our parallel implementation of them. We find that under simulated fault conditions, it is possible to choose a fault detection scheme that for average case matrices can detect 99% of faults with no false alarms, and that for a “worst-case” matrix population can detect 80% of faults with no false alarms
Keywords
aerospace computing; error correction; parallel algorithms; singular value decomposition; software fault tolerance; subroutines; ROC; SVD; algorithm-based fault tolerance; checksum methods; error propagation; fault detection scheme; fault-prone environment; numerical subroutine; numerical tolerances; parallel implementation; spaceborne computing; unpredictable data errors; Computational modeling; Computer architecture; Fault detection; Fault tolerance; Learning systems; Machine learning algorithms; Propulsion; Single event transient; System testing; Telescopes;
fLanguage
English
Publisher
ieee
Conference_Titel
Aerospace Conference Proceedings, 2000 IEEE
Conference_Location
Big Sky, MT
ISSN
1095-323X
Print_ISBN
0-7803-5846-5
Type
conf
DOI
10.1109/AERO.2000.878453
Filename
878453
Link To Document