DocumentCode :
939320
Title :
A comprehensive model for software rejuvenation
Author :
Vaidyanathan, Kalyanaraman ; Trivedi, Kishor S.
Author_Institution :
Scalable Syst. Group, Sun Microsystems, San Diego, CA, USA
Volume :
2
Issue :
2
fYear :
2005
Firstpage :
124
Lastpage :
137
Abstract :
Recently, the phenomenon of software aging, one in which the state of the software system degrades with time, has been reported. This phenomenon, which may eventually lead to system performance degradation and/or crash/hang failure, is the result of exhaustion of operating system resources, data corruption, and numerical error accumulation. To counteract software aging, a technique called software rejuvenation has been proposed, which essentially involves occasionally terminating an application or a system, cleaning its internal state and/or its environment, and restarting it. Since rejuvenation incurs an overhead, an important research issue is to determine optimal times to initiate this action. In this paper, we first describe how to include faults attributed to software aging in the framework of Gray´s software fault classification (deterministic and transient), and study the treatment and recovery strategies for each of the fault classes. We then construct a semi-Markov reward model based on workload and resource usage data collected from the UNIX operating system. We identify different workload states using statistical cluster analysis, estimate transition probabilities, and sojourn time distributions from the data. Corresponding to each resource, a reward function is then defined for the model based on the rate of resource depletion in each state. The model is then solved to obtain estimated times to exhaustion for each resource. The result from the semi-Markov reward model are then fed into a higher-level availability model that accounts for failure followed by reactive recovery, as well as proactive recovery. This comprehensive model is then used to derive optimal rejuvenation schedules that maximize availability or minimize downtime cost.
Keywords :
Markov processes; software fault tolerance; software maintenance; software reliability; statistical analysis; Gray software fault classification; UNIX operating system; crash-hang failure; data corruption; estimate transition probabilities; higher-level availability model; numerical error accumulation; operating system resources; semi-Markov reward model; software aging; software rejuvenation; sojourn time distributions; statistical cluster analysis; system performance degradation; Aging; Application software; Availability; Cleaning; Computer crashes; Degradation; Operating systems; Software systems; State estimation; System performance; Index Terms- Availability; measurement-based dependability evaluation; semi-Markov reward models; software aging; software rejuvenation; workload characterization.;
fLanguage :
English
Journal_Title :
Dependable and Secure Computing, IEEE Transactions on
Publisher :
ieee
ISSN :
1545-5971
Type :
jour
DOI :
10.1109/TDSC.2005.15
Filename :
1453531
Link To Document :
بازگشت