DocumentCode
1887513
Title
Checkpointing SPMD applications on transputer networks
Author
Silva, Luis Moura ; Veer, Bart ; Silva, Joao Gabriel
Author_Institution
Coimbra Univ., Portugal
fYear
1994
fDate
23-25 May 1994
Firstpage
694
Lastpage
701
Abstract
Providing fault-tolerance for parallel/distributed applications is a problem of paramount importance, since the overall failure rate of the system increases with the number of processors, and the failure of just one processor can lend to the complete crash of the program. Checkpointing mechanisms are a good candidate to provide the continuity of the applications in the occurrence of failures. In this paper, we present an experimental study of several variations of checkpointing for SPMD (single process, multiple data) applications. We used a typical benchmark to experimentally assess the overhead, advantages and limitations of each checkpointing scheme
Keywords
fault tolerant computing; parallel processing; performance evaluation; system recovery; transputer systems; SPMD applications; application continuity; benchmark; checkpointing scheme; distributed applications; failure rate; fault-tolerance; overhead assessment; parallel applications; program crash; transputer networks; Application software; Checkpointing; Concurrent computing; Electronic mail; Fault tolerance; Libraries; Master-slave; Parallel processing; Parallel programming; Programming profession;
fLanguage
English
Publisher
ieee
Conference_Titel
Scalable High-Performance Computing Conference, 1994., Proceedings of the
Conference_Location
Knoxville, TN
Print_ISBN
0-8186-5680-8
Type
conf
DOI
10.1109/SHPCC.1994.296709
Filename
296709
Link To Document