DocumentCode :
524656
Title :
Proactive Failure Management for High Availability Computing in Computer Clusters
Author :
Zhang, Ziming ; Fu, Song
Author_Institution :
Dept. of Comput. Sci. & Eng., New Mexico Inst. of Min. & Technol., Socorro, NM, USA
Volume :
1
fYear :
2010
fDate :
28-31 May 2010
Firstpage :
377
Lastpage :
381
Abstract :
In this paper, we propose a framework for autonomic failure management with hierarchical failure prediction functionality for coalition clusters. It analyzes node, cluster and system wide failure behaviors and forecasts the prospective failure occurrences based on quantified failure dynamics. Failure correlations are inspected by the predictor. Experimental results in a computational grid on campus show the offline and online predictions by our predictors accurately forecast the failure trend and capture failure correlations in a coalition clusters environment.
Keywords :
Availability; Conference management; Data analysis; Engineering management; Failure analysis; Grid computing; Large-scale systems; Performance analysis; Resource management; Technology management;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Science and Optimization (CSO), 2010 Third International Joint Conference on
Conference_Location :
Huangshan, Anhui, China
Print_ISBN :
978-1-4244-6812-6
Electronic_ISBN :
978-1-4244-6813-3
Type :
conf
DOI :
10.1109/CSO.2010.71
Filename :
5533049
Link To Document :
بازگشت