• DocumentCode
    1975977
  • Title

    Resource constrained failure management in networked computing systems

  • Author

    Bommannavar, Praveen ; Bambos, Nicholas

  • Author_Institution
    Manage. Sci. & Eng., Stanford Univ., Stanford, CA, USA
  • fYear
    2012
  • fDate
    3-7 Dec. 2012
  • Firstpage
    1884
  • Lastpage
    1889
  • Abstract
    We examine the problem of fault detection in networked computing systems and highlight the tradeoff between diagnosing/reacting to potentially harmful real-time events and minimizing the number of times the system is reset or scanned for malicious activity. The various health states of a system are modeled as states in a Markov chain, and we use a model fitting approach to estimate the transitions between these states. We proceed by considering a scenario in which a system is to be deployed over a fixed horizon but with a limit on the number of times that the health state can be scanned and the system can be reset. Each health state is assigned a cost according to the performance of the system while in that state. Dynamic Programming is then used to find an optimal admissible policy (one that obeys the usage limitation constraints) which achieves the lowest expected aggregate cost. Finally, we examine some properties of the solution.
  • Keywords
    Markov processes; distributed processing; dynamic programming; fault tolerant computing; resource allocation; Markov chain; distributed computing; distributed storage; dynamic programming; fault detection; model fitting approach; networked computing system; optimal admissible policy; resource constrained failure management; usage limitation constraint; budgeted estimation; dynamic programming; failure management; fault detection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Global Communications Conference (GLOBECOM), 2012 IEEE
  • Conference_Location
    Anaheim, CA
  • ISSN
    1930-529X
  • Print_ISBN
    978-1-4673-0920-2
  • Electronic_ISBN
    1930-529X
  • Type

    conf

  • DOI
    10.1109/GLOCOM.2012.6503390
  • Filename
    6503390