• DocumentCode
    3261599
  • Title

    Towards A Model-Based Autonomic Reliability Framework for Computing Clusters

  • Author

    Dubey, Abhishek ; Nordstrom, Steve ; Keskinpala, Turker ; Neema, Sandeep ; Bapty, Ted ; Karsai, Gabor

  • fYear
    2008
  • fDate
    March 31 2008-April 4 2008
  • Firstpage
    75
  • Lastpage
    85
  • Abstract
    One of the primary problems with computing clusters is to ensure that they maintain a reliable working state most of the time to justify economics of operation. In this paper, we introduce a model-based hierarchical reliability framework that enables periodic monitoring of vital health parameters across the cluster and provides for autonomic fault mitigation. We also discuss some of the challenges faced by autonomic reliability frameworks in cluster environments such as non-determinism in task scheduling in standard operating systems such as Linux and need for synchronized execution of monitoring sensors across the cluster. Additionally, we present a solution to these problems in the context of our framework, which utilizes a feedback controller based approach to compensate for the scheduling jitter in non real-time operating systems. Finally, we present experimental data that illustrates the effectiveness of our approach.
  • Keywords
    Adaptive control; Computer networks; Environmental economics; Hardware; Jitter; Linux; Maintenance; Operating systems; Quantum computing; Sensor systems; Autonomic Computing; Cluster Computing; Model Integrated Computing; Model-Based Design; Reliability;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Engineering of Autonomic and Autonomous Systems, 2008. EASE 2008. Fifth IEEE Workshop on
  • Conference_Location
    Belfast, Northern Ireland
  • Print_ISBN
    0-7695-3140-7
  • Type

    conf

  • DOI
    10.1109/EASe.2008.15
  • Filename
    4488290