DocumentCode :
596335
Title :
A distributed counter-based non-blocking coordinated checkpoint algorithm for grid computing applications
Author :
El-Sayed, G.A. ; Hossny, K.A.
Author_Institution :
Electr. Eng. Dept., Assuit Univ., Assuit, Egypt
fYear :
2012
fDate :
12-15 Dec. 2012
Firstpage :
80
Lastpage :
85
Abstract :
In distributed systems, there are many opportunities for failure. Any component in any compute node could fail. This includes, but is not limited to, the processor, disk, memory, or network interface on the node. Any of these failures will cause the processes running on the affected nodes to crash or produce incorrect results. The common method of ensuring the progress of these processes is to take a checkpoint, this issue is complicated if the processes are inter-communication processes. This paper presents a distributed non-blocking coordinated checkpointing algorithm that ensures producing global consistent checkpoints images. These consistent checkpoint images can be used to migrate application processes to different computing nodes when a failure takes place.
Keywords :
checkpointing; distributed algorithms; grid computing; application process migration; crash; distributed counter-based nonblocking coordinated checkpoint algorithm; distributed systems; failures; global consistent checkpoint images; grid computing; inter-communication processes; Algorithm design and analysis; Checkpointing; Indexes; Message passing; Nominations and elections; Radiation detectors; Synchronization; Coordinated checkpointing; consistent state; distributed systems; fault-tolerance;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advances in Computational Tools for Engineering Applications (ACTEA), 2012 2nd International Conference on
Conference_Location :
Beirut
Print_ISBN :
978-1-4673-2488-5
Type :
conf
DOI :
10.1109/ICTEA.2012.6462909
Filename :
6462909
Link To Document :
بازگشت