Title :
Job Migration and Fault Tolerance in SLA-Aware Resource Management Systems
Author :
Battre, D. ; Hovestadt, Matthias ; Kao, Odej ; Keller, Axel ; Voss, Kerstin
Author_Institution :
Tech. Univ. Berlin, Berlin
Abstract :
Contractually fixed service quality levels are mandatory prerequisites for attracting the commercial user to Grid environments. Service level agreements (SLAs) are powerful instruments for describing obligations and expectations in such a business relationship. At the level of local resource management systems, checkpointing and restart is an important instrument for realizing fault tolerance and SLA- awareness. This paper highlights the concepts of migrating such checkpoint datasets to achieve the goal of SLA- compliant job execution.
Keywords :
grid computing; software fault tolerance; fault tolerance; job migration; resource management systems; service level agreements; Business; Checkpointing; Fault tolerance; Fault tolerant systems; Grid computing; Instruments; Middleware; Quality of service; Resource management; Risk management; Checkpointing; Fault Tolerance; Grid; Migration; RMS; Resource Management System; SLA; Service Level Agreement;
Conference_Titel :
Grid and Pervasive Computing Workshops, 2008. GPC Workshops '08. The 3rd International Conference on
Conference_Location :
Kunming
Print_ISBN :
978-0-7695-3177-9
DOI :
10.1109/GPC.WORKSHOPS.2008.71