DocumentCode
1013957
Title
The SURE approach to reliability analysis
Author
Butler, Ricky W.
Author_Institution
NASA Langley Res. Center, Hampton, VA, USA
Volume
41
Issue
2
fYear
1992
fDate
6/1/1992 12:00:00 AM
Firstpage
210
Lastpage
218
Abstract
The SURE computer program, a reliability-analysis tool for ultrareliable computer-system architectures, provides an efficient means for computing reasonably accurate upper and lower bounds for the death state probabilities of a large class of semi-Markov models. Once a semi-Markov model is described using a simple input language, SURE automatically computes the upper and lower bounds on the probability of system failure. A parameter of the model can be specified as a variable over a range of values, thus directing SURE to perform a sensitivity analysis automatically. The program provides a rapid computational capability for semi-Markov models useful for describing the fault-handling behavior of fault-tolerant computer systems. The only modeling restriction imposed by the program is that the nonexponential recovery transitions must be fast in comparison to the mission time. The SURE reliability-analysis method uses a fast bounding theorem based on means and variances and yields upper and lower bounds on the probability of system failure. Techniques have been developed to enable SURE to solve models with loops and calculate the operational-state probabilities. The computation is extremely fast, and large state-spaces can be directly solved; a pruning technique enables SURE to process extremely large models
Keywords
Markov processes; fault tolerant computing; reliability theory; SURE computer program; death state probabilities; fast bounding theorem; fault-tolerant computer systems; lower bounds; mission time; nonexponential recovery transitions; operational-state probabilities; pruning technique; reliability analysis; semi-Markov model; sensitivity analysis; system failure probability; ultrareliable computer-system architectures; upper bounds; Differential equations; Digital systems; Fault tolerant systems; NASA; Operating systems; Probability; Reliability theory; Sensitivity analysis; System recovery; Voice mail;
fLanguage
English
Journal_Title
Reliability, IEEE Transactions on
Publisher
ieee
ISSN
0018-9529
Type
jour
DOI
10.1109/24.257783
Filename
257783
Link To Document