• DocumentCode
    3636855
  • Title

    Improving Grid Fault Tolerance by Means of Global Behavior Modeling

  • Author

    Jesús Montes;Alberto Sánchez;María S. Pérez

  • Author_Institution
    CeSViMa, Univ. Politec. de Madrid, Madrid, Spain
  • fYear
    2010
  • Firstpage
    101
  • Lastpage
    108
  • Abstract
    Grid systems have proved to be one of the most important new alternatives to face challenging problems but, to exploit its benefits, dependability and fault tolerance are key aspects. However, the vast complexity of these systems limits the efficiency of traditional fault tolerance techniques. It seems necessary to distinguish between resource-level fault tolerance (focused on every machine) and service-level fault tolerance (focused on global behavior). Techniques based on these concepts can handle system complexity and increase dependability. We present an autonomous, self-adaptive fault tolerance framework for grid systems, based on a new approach to model distributed environments. The grid is considered as a single entity, instead of a set of independent resources. This point of view focuses on service-level fault tolerance, allowing us to see the big picture and understand the system´s global behavior. The resulting model´s simplicity is the key to provide system-wide fault tolerance.
  • Keywords
    "Fault tolerance","Fault tolerant systems","Grid computing","Distributed computing","Large-scale systems","Cloud computing","Image storage","Control systems","Degradation","Resource management"
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Computing (ISPDC), 2010 Ninth International Symposium on
  • Print_ISBN
    978-1-4244-7602-2
  • Type

    conf

  • DOI
    10.1109/ISPDC.2010.20
  • Filename
    5532500