DocumentCode :
893005
Title :
The Effect of Incomplete and Deleterious Periodic Maintenance on Fault-Tolerant Computer Systems
Author :
Yak, Y.W. ; Dillon, T.S. ; Forward, K.E.
Author_Institution :
Philips Communications Systems Ltd., Clayton
Volume :
35
Issue :
1
fYear :
1986
fDate :
4/1/1986 12:00:00 AM
Firstpage :
85
Lastpage :
90
Abstract :
Maintenance is a common technique to achieve the reliability requirements of fault-tolerant computer systems. Depending on the system, maintenance may be carried out upon the failure of any one module, at regular intervals, or subject to some other maintenance strategy. Due to human factors, maintenance is not always perfect, contrary to usual assumptions. There are several classes of imperfection that adequately describe maintenance strategies. A system with scheduled shut-down times for maintenance is best described by maintenance action that might be incomplete. On the other hand, a system which has to be diagnosed or maintained on-line is best described by maintenance action that may provoke a system failure; we term this deleterious maintenance. This paper examines these two classes of imperfection and obtains mean time-to-first-failure for each class. The results of three studies on Triple Modular Redundancy (TMR) systems are given for the case of maintenance induced failures in fault-tolerant computer systems. In systems with maintenance-induced failure, an optimum maintenance interval exists. The optimum MTFF is less than the optimum MTFF obtained if perfect maintenance is assumed. This is the limit to which maintenance can be used to improve the performance of the system. To increase MTFF beyond this, spares must be used. When these spares are substituted, the probability of inducing a failure must be much lower than that of maintenance-induced failure. For systems which have to be maintained on-line, it is important not to neglect the case of deleterious maintenance.
Keywords :
Availability; Communication systems; Costs; Electronic switching systems; Fault tolerant systems; Human factors; Power system reliability; Preventive maintenance; Redundancy; Reliability theory;
fLanguage :
English
Journal_Title :
Reliability, IEEE Transactions on
Publisher :
ieee
ISSN :
0018-9529
Type :
jour
DOI :
10.1109/TR.1986.4335358
Filename :
4335358
Link To Document :
بازگشت