Title :
A Measurement-Based Model for Workload Dependence of CPU Errors
Author :
Iyer, Ravishankar K. ; Rossetti, David J.
Author_Institution :
Center for Reliable Computing, Computer Systems Laboratory, Stanford University, Stanford CA 94305.; Computer Systems Group, Coordinated Science Laboratory, and the Department of Electrical and Computer Engineering, University of Illinois, Urbana IL 61801
fDate :
6/1/1986 12:00:00 AM
Abstract :
This paper proposes and validates a methodology to measure explicitly the increase in the risk of a processor error with increasing workload. By relating the occurrence of a CPU related error to the system activity just prior to the occurrence of an error, the approach measures the dynamic CPU workload/failure relationship. The measurements show that the probability of a CPU related error (the load hazard) increases nonlinearly with increasing workload; i.e., the CPU rapidly deteriorates as end points are reached. The load hazard is observed to be most sensitive to system CPU utilization, the I/O rate, and the interrupt rates. The results are significant because they indicate that it may not be useful to push a system close to its performance limits (the previously accepted operating goal) since what we gain in slightly improved performance is more than offset by the degradation in reliability. Importantly, they also indicate that conventional reliability models need to be reevaluated so as to take system work-load explicity into account.
Keywords :
Data analysis; Degradation; Failure analysis; Hazards; Linear accelerators; Nonlinear dynamical systems; Performance analysis; Performance gain; Performance loss; Risk analysis; CPU errors; data analysis; measurements; reliability; workload;
Journal_Title :
Computers, IEEE Transactions on
DOI :
10.1109/TC.1986.5009428