DocumentCode
1081641
Title
Using multi-stage and stratified sampling for inferring fault-coverage probabilities
Author
Constantinescu, Cristian
Author_Institution
Duke Univ., Durham, NC, USA
Volume
44
Issue
4
fYear
1995
fDate
12/1/1995 12:00:00 AM
Firstpage
632
Lastpage
639
Abstract
Development of fault-tolerant computing systems requires accurate reliability modeling. Analytic, simulation, and hybrid models are commonly used for obtaining reliability measures. These measures are functions of component failure rates and fault-coverage (probabilities). Coverage provides information about the fault and error detection, isolation, and system recovery capabilities. This parameter can be derived by physical or simulated fault injection. Statistical inference has been used to extract meaningful information from sample observation. The problem of conducting fault injection experiments and statistically inferring the coverage from the information gathered in those experiments is addressed in this paper. We perform statistical experiments in a multi-dimensional space of events. In this way all major factors which influence the coverage (fault locations, timing characteristics of the fault, and the workload) are accounted for. Multi-stage, stratified, and combined multi-stage and stratified sampling are used in this paper for deriving the coverage. Equations of the mean, variance, and confidence interval of the coverage are provided. The statistical error produced by the injected faults which do not induce errors in the tested system (also known as the nonresponse problem) is considered, A program which emulates a typical fault environment was developed and four hypothetical systems are analyzed
Keywords
fault tolerant computing; probability; reliability; reliability theory; system recovery; analytic models; component failure rates; confidence interval; error detection; fault detection; fault injection simulation; fault-coverage; fault-coverage probabilities; hybrid models; multi-stage sampling; nonresponse problem; reliability modeling; simulation models; statistical inference; stratified sampling; system recovery capabilities; timing characteristics; Analytical models; Computational modeling; Data mining; Fault detection; Fault location; Fault tolerant systems; Probability; Sampling methods; System recovery; Timing;
fLanguage
English
Journal_Title
Reliability, IEEE Transactions on
Publisher
ieee
ISSN
0018-9529
Type
jour
DOI
10.1109/24.475993
Filename
475993
Link To Document