Title :
Fault Tolerance using "Parallel Shadow Image Servers (PSIS)" in Grid Based Computing Environment
Author :
Hussain, Naveed ; Ansari, M.A. ; Yasin, M.M. ; Rauf, Abdul ; Haider, Sajjad
Author_Institution :
Dept. of Inf. Technol., Nat. Univ. of Modern Languages, Islamabad
Abstract :
This paper presents a critical review of the existing fault tolerance mechanism in grid computing and the overhead involved in terms of reprocessing or rescheduling of jobs, if in case a fault arisen. For this purpose we suggested the parallel shadow image server (PSIS) copying techniques in parallel to the resource manager for having the check points for rescheduling of jobs from the nearest flag, if in case the fault is detected. The job process is to be scheduled from the resource manager node to the worker nodes and then its´ submitted back by the worker nodes in serialized form to the parallel shadow image servers from the worker nodes after the pre-specified amount of time, which we call the recent spawn or the flag check point for rescheduling or reprocessing of job. If the fault is arisen then the rescheduling is done from the recent check point and submitted to the worker node from where the job was terminated. This will not only save time but will improve the performance up to major extent
Keywords :
grid computing; scheduling; software fault tolerance; fault tolerance; flag check point; grid based computing environment; parallel shadow image servers; resource manager node; Checkpointing; Computer science; Concurrent computing; Fault tolerance; Grid computing; Information technology; Load management; Peer to peer computing; Processor scheduling; Resource management;
Conference_Titel :
Emerging Technologies, 2006. ICET '06. International Conference on
Conference_Location :
Peshawar
Print_ISBN :
1-4244-0502-5
Electronic_ISBN :
1-4244-0503-3
DOI :
10.1109/ICET.2006.335982