• DocumentCode
    1728605
  • Title

    Job Migration and Fault Tolerance in SLA-Aware Resource Management Systems

  • Author

    Battre, D. ; Hovestadt, Matthias ; Kao, Odej ; Keller, Axel ; Voss, Kerstin

  • Author_Institution
    Tech. Univ. Berlin, Berlin
  • fYear
    2008
  • Firstpage
    43
  • Lastpage
    48
  • Abstract
    Contractually fixed service quality levels are mandatory prerequisites for attracting the commercial user to Grid environments. Service level agreements (SLAs) are powerful instruments for describing obligations and expectations in such a business relationship. At the level of local resource management systems, checkpointing and restart is an important instrument for realizing fault tolerance and SLA- awareness. This paper highlights the concepts of migrating such checkpoint datasets to achieve the goal of SLA- compliant job execution.
  • Keywords
    grid computing; software fault tolerance; fault tolerance; job migration; resource management systems; service level agreements; Business; Checkpointing; Fault tolerance; Fault tolerant systems; Grid computing; Instruments; Middleware; Quality of service; Resource management; Risk management; Checkpointing; Fault Tolerance; Grid; Migration; RMS; Resource Management System; SLA; Service Level Agreement;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Grid and Pervasive Computing Workshops, 2008. GPC Workshops '08. The 3rd International Conference on
  • Conference_Location
    Kunming
  • Print_ISBN
    978-0-7695-3177-9
  • Type

    conf

  • DOI
    10.1109/GPC.WORKSHOPS.2008.71
  • Filename
    4539323