مرکز منطقه ای اطلاع رساني علوم و فناوري - Effect of System Processes on Error Repetition: A Probabilistic and Measurement Approach

Abstract :

This paper presents an approach to system reliability modeling where failures and errors are not statistically independent. The repetition of failures and errors until their causes are removed is affected by the system processes and degrades system reliability. Four types of failures are introduced: hardware transients, software and hardware design errors, and program faults. Probability of failure, mean time to failure, and system reliability depend on the type of failure. Actual measurements show that the most critical factor for system reliability is the time after occurrence of a failure when this failure can be repeated in every process that accesses a failed component. An example involving measurements collected in an IBM 4331 installation validates the model and shows its applications. The degradation of system reliability can be appreciable even for very short periods of time. This is why the conditional probability of repetition of failures is introduced. The reliability model allows prediction of system reliability based on the calculation of the mean time to failure. The comparison with the measurement results shows that the model with process dependent repetition of failures approximates system reliability with better accuracy than the model with the assumption of independent failures.