DocumentCode
3261599
Title
Towards A Model-Based Autonomic Reliability Framework for Computing Clusters
Author
Dubey, Abhishek ; Nordstrom, Steve ; Keskinpala, Turker ; Neema, Sandeep ; Bapty, Ted ; Karsai, Gabor
fYear
2008
fDate
March 31 2008-April 4 2008
Firstpage
75
Lastpage
85
Abstract
One of the primary problems with computing clusters is to ensure that they maintain a reliable working state most of the time to justify economics of operation. In this paper, we introduce a model-based hierarchical reliability framework that enables periodic monitoring of vital health parameters across the cluster and provides for autonomic fault mitigation. We also discuss some of the challenges faced by autonomic reliability frameworks in cluster environments such as non-determinism in task scheduling in standard operating systems such as Linux and need for synchronized execution of monitoring sensors across the cluster. Additionally, we present a solution to these problems in the context of our framework, which utilizes a feedback controller based approach to compensate for the scheduling jitter in non real-time operating systems. Finally, we present experimental data that illustrates the effectiveness of our approach.
Keywords
Adaptive control; Computer networks; Environmental economics; Hardware; Jitter; Linux; Maintenance; Operating systems; Quantum computing; Sensor systems; Autonomic Computing; Cluster Computing; Model Integrated Computing; Model-Based Design; Reliability;
fLanguage
English
Publisher
ieee
Conference_Titel
Engineering of Autonomic and Autonomous Systems, 2008. EASE 2008. Fifth IEEE Workshop on
Conference_Location
Belfast, Northern Ireland
Print_ISBN
0-7695-3140-7
Type
conf
DOI
10.1109/EASe.2008.15
Filename
4488290
Link To Document