DocumentCode :
2149723
Title :
An extensible framework for repair-driven monitoring
Author :
Reidemeister, Thomas ; Jiang, Miao ; Ward, Paul A S
Author_Institution :
E&CE Dept., Univ. of Waterloo, Waterloo, ON, Canada
fYear :
2010
fDate :
25-29 Oct. 2010
Firstpage :
142
Lastpage :
149
Abstract :
In recent years autonomic computing, specifically autonomic data centre management has gained significant attention. Human intervention be minimized to reduce the operating costs of business applications. In this paper we focus our attention to the self-repair dimension and present a flexible probabilistic framework to develop agents for self-repair in the context of business-information-system components. Our framework seeks to pick the optimal sequence of repair actions given only imperfect information about the experienced fault. In contrast to existing recovery-oriented approaches, our model explicitly considers fault prevalence, symptoms of recurrent failures, and inclusive repair actions. We evaluate our proposal using discrete event simulation. Our evaluation shows that an optimal repair policy can be computed from a brief specification of repair actions. Even in the context of very unreliable error detection our controller is able to estimate the current state of the monitored system and recover from failure.
Keywords :
business data processing; computer centres; cost reduction; discrete event simulation; fault tolerant computing; information systems; multi-agent systems; probability; system monitoring; agents; autonomic computing; autonomic data centre management; business application; business-information-system component; discrete event simulation; extensible framework; fault prevalence; flexible probabilistic framework; human intervention; inclusive repair actions; operating cost reduction; optimal repair policy; recurrent failure symptoms; repair-driven monitoring; self-repair dimension; system monitoring; Context; Fault diagnosis; Maintenance engineering; Markov processes; Mathematical model; Monitoring; Probes;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Network and Service Management (CNSM), 2010 International Conference on
Conference_Location :
Niagara Falls, ON
Print_ISBN :
978-1-4244-8910-7
Electronic_ISBN :
978-1-4244-8908-4
Type :
conf
DOI :
10.1109/CNSM.2010.5691320
Filename :
5691320
Link To Document :
بازگشت