Title :
Achieving reliability growth on real-time systems
Author :
Lane, Christopher A. ; Morrison, Joseph D.
Author_Institution :
IBM Corp., Rockville, MD, USA
Abstract :
This paper addresses the principles used to predict and attain reliability growth on real-time systems. System reliability modeling techniques that include software reliability, maintenance effectiveness, and failure recovery are discussed in detail. Several software reliability growth models are discussed with emphasis on measured reliability growth of fielded software. The impact of maintenance effectiveness, which is a measure of the maintainer´s skill and training levels, is shown. The need to develop and measure the robustness of failure recovery algorithms is emphasized in this paper. All of these factors are combined with the failure and repair characteristics of hardware to create comprehensive reliability growth models for real-time systems. Through the authors´ research, they have determined that effective failure recovery algorithms are the key to attaining highly reliable systems. Without them, redundant computer systems that run banking and air traffic control systems will come crashing down with possibly disastrous results. The modeling and measurement techniques discussed in this paper provide the reliability practitioner with the methods to predict and achieve reliability growth resulting from improved software reliability and recovery algorithms. A fault tolerant system´s ability to recover from hardware and software failures is gauged by a parameter called coverage. Coverage is the conditional probability of recovery given that a failure has occurred. Because of its huge impact on system reliability, the measurement of coverage is emphasized
Keywords :
Markov processes; fault tolerant computing; real-time systems; reliability theory; software maintenance; software reliability; system recovery; conditional probability of recovery; coverage; failure recovery; fault tolerant system; maintenance effectiveness; real-time systems; reliability growth; reliability modeling techniques; robustness; software reliability; Air traffic control; Banking; Computer crashes; Hardware; Predictive models; Real time systems; Robustness; Software maintenance; Software measurement; Software reliability;
Conference_Titel :
Reliability and Maintainability Symposium, 1994. Proceedings., Annual
Conference_Location :
Anaheim, CA
Print_ISBN :
0-7803-1786-6
DOI :
10.1109/RAMS.1994.291096