DocumentCode
3026094
Title
Proactive fault handling for system availability enhancement
Author
Salfner, Felix ; Malek, Miroslaw
Author_Institution
Dept. of Comput. Sci., Humboldt-Univ., Berlin, Germany
fYear
2005
fDate
4-8 April 2005
Abstract
Proactive fault handling combines prevention and repair actions with failure prediction techniques. We extend the standard availability formula by five key measures: (1) precision and (2) recall assess failure prediction while failure handling is gauged by (3) prevention probability, (4) repair time improvement, and (5) risk of introducing additional failures. We give a short survey of actions that are suited to be combined with failure prediction and provide a procedure to estimate the five key measures. Altogether, this allows to quantify the impact of proactive fault handling on system availability and may provide valuable input for system design.
Keywords
failure analysis; fault tolerant computing; probability; system recovery; failure prediction techniques; prevention probability; proactive fault handling; repair time improvement; standard availability formula; system availability enhancement; Computer science; Counting circuits; Distributed processing; Equations; Measurement standards; Prediction methods; Preventive maintenance; Processor scheduling; State estimation; Time measurement;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel and Distributed Processing Symposium, 2005. Proceedings. 19th IEEE International
Print_ISBN
0-7695-2312-9
Type
conf
DOI
10.1109/IPDPS.2005.360
Filename
1420243
Link To Document