DocumentCode :
2776711
Title :
A distributed load-based failure recovery mechanism for advance reservation environments
Author :
Burchard, Lars-Olof ; Linnert, Barry ; Schneider, Jörg
Author_Institution :
Technische Univ. Berlin, Germany
Volume :
2
fYear :
2005
fDate :
9-12 May 2005
Firstpage :
1071
Abstract :
Resource reservations in advance are a mature concept for the allocation of various resources, particularly in grid environments. Common grid tool kits support advance reservations and assign jobs to resources at admission time. In such a distributed environment, it is necessary to develop carefully tailored failure recovery mechanisms that provide seamless transparent migration of jobs from one resource to another. As the migration of running jobs is difficult, an important issue in advance reservation, i.e., planning based, management infrastructures is to determine the duration of a failure in order to remap jobs that are already allocated to a currently failed resource but not yet active. As shown in previous work, underestimations of the failure duration and as a consequence the remapping of too few jobs results in an increased amount of job terminations. In order to overcome this drawback, we propose a load-based computation of the jobs to be remapped. A centralized and a distributed version of the strategy are presented, showing it is not necessary to have knowledge beyond the local allocation on the failed resource. These load-based strategies achieve effective remapping of jobs while avoiding - inevitably inaccurate - estimations of the failure duration.
Keywords :
grid computing; resource allocation; system recovery; advance resource reservation environments; distributed environment; distributed load-based failure recovery mechanism; grid tool kits; job remapping; resource allocation; transparent job migration; Context-aware services; Data processing; Databases; Environmental management; Grid computing; Information filtering; Information filters; Internet; Resource management; Satellites;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cluster Computing and the Grid, 2005. CCGrid 2005. IEEE International Symposium on
Print_ISBN :
0-7803-9074-1
Type :
conf
DOI :
10.1109/CCGRID.2005.1558679
Filename :
1558679
Link To Document :
بازگشت