Title :
Analysis of Restart Mechanisms in Software Systems
Author :
Van Moorsel, Aad P A ; Wolter, Katinka
Author_Institution :
Sch. of Comput. Sci., Newcastle Univ., NSW
Abstract :
Restarts or retries are a common phenomenon in computing systems, for instance, in preventive maintenance, software rejuvenation, or when a failure is suspected. Typically, one sets a time-out to trigger the restart. We analyze and optimize time-out strategies for scenarios in which the expected required remaining time of a task is not always decreasing with the time invested in it. Examples of such tasks include the download of Web pages, randomized algorithms, distributed queries, and jobs subject to network or other failures. Assuming the independence of the completion time of successive tries, we derive computationally attractive expressions for the moments of the completion time, as well as for the probability that a task is able to meet a deadline. These expressions facilitate efficient algorithms to compute optimal restart strategies and are promising candidates for pragmatic online optimization of restart timers
Keywords :
software fault tolerance; software maintenance; system recovery; fault-tolerant system; optimal restart strategy; software failure; software performance modeling; software preventive maintenance; software rejuvenation; software reliability modeling; software system restart mechanism analysis; software time-out strategy analysis; Adaptive systems; Computer network reliability; Failure analysis; Fault tolerant systems; Internet; Preventive maintenance; Software maintenance; Software performance; Software systems; Web pages; Restart; adaptive systems; completion time; fault-tolerant systems; performance and reliability modeling; self-management.; software rejuvenation; time-out;
Journal_Title :
Software Engineering, IEEE Transactions on
DOI :
10.1109/TSE.2006.73