• DocumentCode
    1887513
  • Title

    Checkpointing SPMD applications on transputer networks

  • Author

    Silva, Luis Moura ; Veer, Bart ; Silva, Joao Gabriel

  • Author_Institution
    Coimbra Univ., Portugal
  • fYear
    1994
  • fDate
    23-25 May 1994
  • Firstpage
    694
  • Lastpage
    701
  • Abstract
    Providing fault-tolerance for parallel/distributed applications is a problem of paramount importance, since the overall failure rate of the system increases with the number of processors, and the failure of just one processor can lend to the complete crash of the program. Checkpointing mechanisms are a good candidate to provide the continuity of the applications in the occurrence of failures. In this paper, we present an experimental study of several variations of checkpointing for SPMD (single process, multiple data) applications. We used a typical benchmark to experimentally assess the overhead, advantages and limitations of each checkpointing scheme
  • Keywords
    fault tolerant computing; parallel processing; performance evaluation; system recovery; transputer systems; SPMD applications; application continuity; benchmark; checkpointing scheme; distributed applications; failure rate; fault-tolerance; overhead assessment; parallel applications; program crash; transputer networks; Application software; Checkpointing; Concurrent computing; Electronic mail; Fault tolerance; Libraries; Master-slave; Parallel processing; Parallel programming; Programming profession;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Scalable High-Performance Computing Conference, 1994., Proceedings of the
  • Conference_Location
    Knoxville, TN
  • Print_ISBN
    0-8186-5680-8
  • Type

    conf

  • DOI
    10.1109/SHPCC.1994.296709
  • Filename
    296709