DocumentCode :
381323
Title :
Fault injection experiment results in space borne parallel application programs
Author :
Some, Raphael R. ; Kim, Won S. ; Khanoyan, Garen ; Callum, Leslie ; Agrawal, Anil ; Beahan, John J. ; Shamilian, Arshaluys ; Nikora, Allen
Author_Institution :
Jet Propulsion Lab., California Inst. of Technol., Pasadena, CA, USA
Volume :
5
fYear :
2002
fDate :
2002
Firstpage :
85224
Abstract :
Development of the REE Commercial-Off-The-Shelf (COTS) based space-borne supercomputer requires a detailed knowledge of system behavior in the presence of Single Event Upset (SEU) induced faults. When combined with a hardware radiation fault model and mission environment data in a medium grained system model, experimentally obtained fault behavior data can be used to: predict system reliability, availability and performance; determine optimal fault detection methods and boundaries; and define high ROI fault tolerance strategies. The REE project has developed a fault injection suite of tools and a methodology for experimentally determining system behavior statistics in the presence of application level SEU induced transient faults. Initial characterization of science data application code for an autonomous Mars Rover geology application indicates that this code is relatively insensitive to SEUs and thus can be made highly immune to application level faults with relatively low overhead strategies.
Keywords :
aerospace computing; fault tolerant computing; parallel machines; parallel programming; radiation effects; software performance evaluation; software reliability; REE COTS based space-borne supercomputer; SEU induced faults; application level SEU induced transient faults; autonomous Mars Rover geology application; fault behavior data; fault injection tool suite; fault tolerance strategies; hardware radiation fault model; medium grained system model; mission environment data; optimal fault detection methods; single event upsets; space borne parallel application programs; system availability; system behavior statistics; system performance; system reliability; Availability; Fault detection; Fault tolerant systems; Hardware; Mars; Predictive models; Reliability; Single event upset; Statistics; Supercomputers;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Aerospace Conference Proceedings, 2002. IEEE
Print_ISBN :
0-7803-7231-X
Type :
conf
DOI :
10.1109/AERO.2002.1035379
Filename :
1035379
Link To Document :
بازگشت