DocumentCode
3301521
Title
Hierarchical replication techniques to ensure checkpoint storage reliability in grid environment
Author
Bouabache, Fatiha ; Herault, Thomas ; Fedak, Gilles ; Cappello, Franck
Author_Institution
Univ. Paris Sud-XI, Orsay
fYear
2008
fDate
March 31 2008-April 4 2008
Firstpage
939
Lastpage
940
Abstract
High performance computing has an important role in scientific and engineering researches. As the size of high performance systems increases continuously, the average time between failures becomes increasingly small. So fault tolerance becomes a critical property for parallel applications running on these systems. MPI (message passing interface) paradigm is actually the most used to write parallel applications. However, in traditional implementations, when a failure occurs, the whole distributed application is shutdown and restarted. To avoid this, many solutions have been proposed, but the most used is rollback recovery. Rollback recovery is based upon the concept of a checkpoint. A checkpoint describes the state of one or more components of the system at a given time of its execution.
Keywords
application program interfaces; checkpointing; fault tolerance; grid computing; message passing; Hierarchical Replication Techniques to Ensure Checkpoint Storage Reliability in Grid; checkpoint storage reliability; grid environment; hierarchical replication technique; high performance computing; message passing interface; rollback recovery; Communication channels; Fault tolerant systems; Frequency; High performance computing; Image storage; Message passing; Protocols; Reliability engineering; Switches; Topology;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Systems and Applications, 2008. AICCSA 2008. IEEE/ACS International Conference on
Conference_Location
Doha
Print_ISBN
978-1-4244-1967-8
Electronic_ISBN
978-1-4244-1968-5
Type
conf
DOI
10.1109/AICCSA.2008.4493654
Filename
4493654
Link To Document