DocumentCode :
3369114
Title :
A measurement-based model for estimation of resource exhaustion in operational software systems
Author :
Vaidyanathan, Kalyanaraman ; Trivedi, Kishor S.
Author_Institution :
Dept. of Electr. & Comput. Eng., Duke Univ., Durham, NC, USA
fYear :
1999
fDate :
1999
Firstpage :
84
Lastpage :
93
Abstract :
Software systems are known to suffer from outages due to transient errors. Recently, the phenomenon of “software aging”, in which the state of the software system degrades with time, has been reported (S. Garg et al., 1998). The primary causes of this degradation are the exhaustion of operating system resources, data corruption and numerical error accumulation. This may eventually lead to performance degradation of the software or crash/hang failure, or both. Earlier work in this area to detect aging and to estimate its effect on system resources did not take into account the system workload. In this paper, we propose a measurement-based model to estimate the rate of exhaustion of operating system resources both as a function of time and the system workload state. A semi-Markov reward model is constructed based on workload and resource usage data collected from the UNIX operating system. We first identify different workload states using statistical cluster analysis and build a state-space model. Corresponding to each resource, a reward function is then defined for the model based on the rate of resource exhaustion in the different states. The model is then solved to obtain trends and the estimated exhaustion rates and the time-to-exhaustion for the resources. With the help of this measure, proactive fault management techniques such as “software rejuvenation” (Y. Huang et al., 1995) may be employed to prevent unexpected outages
Keywords :
Markov processes; Unix; operating systems (computers); parameter estimation; pattern clustering; resource allocation; software metrics; software performance evaluation; software reliability; state-space methods; system recovery; UNIX; crash/hang failure; data corruption; measurement-based model; numerical error accumulation; operating system resource exhaustion rate estimation; operational software systems; outages; proactive fault management techniques; resource usage data; reward function; semi-Markov reward model; software aging; software performance degradation; software rejuvenation; software system state degradation; state-space model; statistical cluster analysis; system workload states; time to exhaustion; transient errors; Aging; Computer crashes; Computer errors; Degradation; Operating systems; Read only memory; Software measurement; Software safety; Software systems; Time measurement;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Software Reliability Engineering, 1999. Proceedings. 10th International Symposium on
Conference_Location :
Boca Raton, FL
ISSN :
1071-9458
Print_ISBN :
0-7695-0443-4
Type :
conf
DOI :
10.1109/ISSRE.1999.809313
Filename :
809313
Link To Document :
بازگشت