DocumentCode :
1639654
Title :
Application-Level Fault-Tolerance Solutions for Grid Computing
Author :
Diaz, David ; Pardo, Xoán C. ; Martin, Maria J. ; Gonzalez, P.
Author_Institution :
Comput. Archit. Group, A Coruna Univ., A Coruna
fYear :
2008
Firstpage :
554
Lastpage :
559
Abstract :
One of the key functionalities provided by Grid systems is the remote execution of applications. This paper introduces a research proposal on fault-tolerance mechanisms for the execution of sequential and message-passing parallel applications on the Grid. A service-based architecture called CPPC-G is proposed. The CPPC (Controller/Precompiler for Portable Checkpointing) framework is used to insert checkpointing instrumentation into the application code. CPPC-G services will be in charge of the submission and monitoring of the application execution, management of checkpoint files generated by CPPC-enabled applications, and detection and automatic restart of failed executions. The development of the CPPC-G architecture will involve research in different areas such as storage and management of data files (checkpointfiles); automatic selection of suitable computing resources; reliable detection of execution failures and robustness issues to make the architecture fault-tolerant itself.
Keywords :
checkpointing; fault tolerant computing; grid computing; message passing; parallel programming; program compilers; system monitoring; application-level fault tolerance; grid computing; message passing; portable checkpointing; precompiler; reliable failure detection; sequential-parallel application execution; service-based architecture; system monitoring; Automatic control; Checkpointing; Computer architecture; Computerized monitoring; Condition monitoring; Fault tolerance; Grid computing; Instruments; Proposals; Storage automation; CPPC; Globus; checkpointing; fault tolerance; grid computation; parallel computation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cluster Computing and the Grid, 2008. CCGRID '08. 8th IEEE International Symposium on
Conference_Location :
Lyon
Print_ISBN :
978-0-7695-3156-4
Electronic_ISBN :
978-0-7695-3156-4
Type :
conf
DOI :
10.1109/CCGRID.2008.38
Filename :
4534262
Link To Document :
بازگشت