Title :
An extensible framework for repair-driven monitoring
Author :
Reidemeister, Thomas ; Jiang, Miao ; Ward, Paul A S
Author_Institution :
E&CE Dept., Univ. of Waterloo, Waterloo, ON, Canada
Abstract :
In recent years autonomic computing, specifically autonomic data centre management has gained significant attention. Human intervention be minimized to reduce the operating costs of business applications. In this paper we focus our attention to the self-repair dimension and present a flexible probabilistic framework to develop agents for self-repair in the context of business-information-system components. Our framework seeks to pick the optimal sequence of repair actions given only imperfect information about the experienced fault. In contrast to existing recovery-oriented approaches, our model explicitly considers fault prevalence, symptoms of recurrent failures, and inclusive repair actions. We evaluate our proposal using discrete event simulation. Our evaluation shows that an optimal repair policy can be computed from a brief specification of repair actions. Even in the context of very unreliable error detection our controller is able to estimate the current state of the monitored system and recover from failure.
Keywords :
business data processing; computer centres; cost reduction; discrete event simulation; fault tolerant computing; information systems; multi-agent systems; probability; system monitoring; agents; autonomic computing; autonomic data centre management; business application; business-information-system component; discrete event simulation; extensible framework; fault prevalence; flexible probabilistic framework; human intervention; inclusive repair actions; operating cost reduction; optimal repair policy; recurrent failure symptoms; repair-driven monitoring; self-repair dimension; system monitoring; Context; Fault diagnosis; Maintenance engineering; Markov processes; Mathematical model; Monitoring; Probes;
Conference_Titel :
Network and Service Management (CNSM), 2010 International Conference on
Conference_Location :
Niagara Falls, ON
Print_ISBN :
978-1-4244-8910-7
Electronic_ISBN :
978-1-4244-8908-4
DOI :
10.1109/CNSM.2010.5691320