DocumentCode :
584807
Title :
Application checkpointing in grid environment with improved checkpoint reliability through replication
Author :
Bawa, Rajesh Kumar ; Singh, Rajdeep
Author_Institution :
Dept. of Comput. Sci., Punjabi Univ., Patiala, India
fYear :
2012
fDate :
26-28 July 2012
Firstpage :
1
Lastpage :
6
Abstract :
Grid technologies are emerging as the next generation of distributed computing, allowing the aggregation of heterogeneous resources that are geographically distributed. The heterogeneous nature of the grid makes it more vulnerable to faults which lead to either the failure of the job or delay in completing the execution of the job. Checkpointing is one of the many fault tolerance techniques which are used to make Grid more efficient and reliable. In this paper we have developed an application checkpointing based fault tolerance technique for Alchemi based Grid environment. In this technique application threads generate their checkpoints and store them in the checkpoint table at the manager node. In case a thread fails checkpoint of the corresponding thread is used to resume the execution from the point of failure. This technique introduces a slight overhead in fault free situations but very effective in case of a node failure. Increased checkpoint frequency improves job´s resuming capability but also increases the overhead of generating and storing checkpoints which results in increased processing time of the job.
Keywords :
checkpointing; fault tolerant computing; grid computing; reliability; resource allocation; scheduling; Alchemi based grid environment; application checkpointing based fault tolerance technique; application threads; fault free situations; geographically distributed heterogeneous resources; grid environment; grid technologies; improved checkpoint reliability; job execution delay; job failure; job resuming capability; next generation distributed computing; replication; Message systems; Reliability engineering; Time frequency analysis; Fault Tolerance; Job Scheduling; QoS (Quality of Service); Resource Management;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computing Communication & Networking Technologies (ICCCNT), 2012 Third International Conference on
Conference_Location :
Coimbatore
Type :
conf
DOI :
10.1109/ICCCNT.2012.6395974
Filename :
6395974
Link To Document :
بازگشت