• DocumentCode
    3018636
  • Title

    Event Logging: Portable and Efficient Checkpointing in Heterogeneous Environments with Non-FIFO Communication Platforms

  • Author

    Peng, Zhao ; Lastovetsky, Alexey

  • Author_Institution
    Dept. of Comput. Sci., Univ. Coll. Dublin, Ireland
  • fYear
    2005
  • fDate
    04-08 April 2005
  • Abstract
    The Chandy-Lamport checkpointing algorithm is widely used in fault tolerant implementations of MPI. However, it assumes the FIFO property of message passing, which is not guaranteed by the MPI standard at the application level. Therefore, this algorithm cannot serve as a basis for an implementation-independent fault tolerant MPI. In this paper, we present a variant of the Chandy-Lamport algorithm that does not rely on the FIFO property. This algorithm can be implemented on top of MPI and, hence, used for development of a supplement software component enabling the fault tolerance of any MPI implementation compliant with the MPI standard. We prove the correctness of the algorithm and analyze its performance. Experimental results demonstrating the efficiency of the algorithm are also presented.
  • Keywords
    application program interfaces; checkpointing; fault tolerant computing; message passing; object-oriented programming; parallel programming; program verification; Chandy-Lamport checkpointing algorithm; FIFO property; MPI standard; algorithm correctness proving; event logging; fault tolerant MPI; heterogeneous environment; message passing; nonFIFO communication platform; software component; Application software; Checkpointing; Computer science; Educational institutions; Fault tolerance; Message passing; Performance analysis; Software algorithms; Software standards; Standards development;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Symposium, 2005. Proceedings. 19th IEEE International
  • Print_ISBN
    0-7695-2312-9
  • Type

    conf

  • DOI
    10.1109/IPDPS.2005.207
  • Filename
    1419959