• DocumentCode
    3026094
  • Title

    Proactive fault handling for system availability enhancement

  • Author

    Salfner, Felix ; Malek, Miroslaw

  • Author_Institution
    Dept. of Comput. Sci., Humboldt-Univ., Berlin, Germany
  • fYear
    2005
  • fDate
    4-8 April 2005
  • Abstract
    Proactive fault handling combines prevention and repair actions with failure prediction techniques. We extend the standard availability formula by five key measures: (1) precision and (2) recall assess failure prediction while failure handling is gauged by (3) prevention probability, (4) repair time improvement, and (5) risk of introducing additional failures. We give a short survey of actions that are suited to be combined with failure prediction and provide a procedure to estimate the five key measures. Altogether, this allows to quantify the impact of proactive fault handling on system availability and may provide valuable input for system design.
  • Keywords
    failure analysis; fault tolerant computing; probability; system recovery; failure prediction techniques; prevention probability; proactive fault handling; repair time improvement; standard availability formula; system availability enhancement; Computer science; Counting circuits; Distributed processing; Equations; Measurement standards; Prediction methods; Preventive maintenance; Processor scheduling; State estimation; Time measurement;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Symposium, 2005. Proceedings. 19th IEEE International
  • Print_ISBN
    0-7695-2312-9
  • Type

    conf

  • DOI
    10.1109/IPDPS.2005.360
  • Filename
    1420243