DocumentCode :
1975977
Title :
Resource constrained failure management in networked computing systems
Author :
Bommannavar, Praveen ; Bambos, Nicholas
Author_Institution :
Manage. Sci. & Eng., Stanford Univ., Stanford, CA, USA
fYear :
2012
fDate :
3-7 Dec. 2012
Firstpage :
1884
Lastpage :
1889
Abstract :
We examine the problem of fault detection in networked computing systems and highlight the tradeoff between diagnosing/reacting to potentially harmful real-time events and minimizing the number of times the system is reset or scanned for malicious activity. The various health states of a system are modeled as states in a Markov chain, and we use a model fitting approach to estimate the transitions between these states. We proceed by considering a scenario in which a system is to be deployed over a fixed horizon but with a limit on the number of times that the health state can be scanned and the system can be reset. Each health state is assigned a cost according to the performance of the system while in that state. Dynamic Programming is then used to find an optimal admissible policy (one that obeys the usage limitation constraints) which achieves the lowest expected aggregate cost. Finally, we examine some properties of the solution.
Keywords :
Markov processes; distributed processing; dynamic programming; fault tolerant computing; resource allocation; Markov chain; distributed computing; distributed storage; dynamic programming; fault detection; model fitting approach; networked computing system; optimal admissible policy; resource constrained failure management; usage limitation constraint; budgeted estimation; dynamic programming; failure management; fault detection;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Global Communications Conference (GLOBECOM), 2012 IEEE
Conference_Location :
Anaheim, CA
ISSN :
1930-529X
Print_ISBN :
978-1-4673-0920-2
Electronic_ISBN :
1930-529X
Type :
conf
DOI :
10.1109/GLOCOM.2012.6503390
Filename :
6503390
Link To Document :
بازگشت