• DocumentCode
    3010477
  • Title

    Static Analysis for Application-Level Checkpointing of MPI Programs

  • Author

    Wang, Panfeng ; Du, Yunfei ; Fu, Hongyi ; Yang, Xuejun ; Zhou, Haifang

  • Author_Institution
    Nat. Lab. for Parallel & Distrib. Process., Nat. Univ. of Defense Technol., Changsha
  • fYear
    2008
  • fDate
    25-27 Sept. 2008
  • Firstpage
    548
  • Lastpage
    555
  • Abstract
    Application-level checkpointing is a promising technology in the domain of large-scale scientific computing. The consistency of global checkpoint must be carefully guaranteed in order to correctly restore the computation. Usually, some complex coordinated protocols are employed to ensure the consistency of global checkpoint, which require logging orphan or in-transit messages during checkpointing. These protocols complicate the recovery of the computation and increase the checkpoint overhead due to logging message. In this paper, a new method which ensures the consistency of global checkpoint by static analysis is proposed. The method identifies the safe checkpointing regions in MPI programs, where the global checkpoint is always strongly consistent. All checkpoints are located in those safe checkpoint regions. During checkpointing, the method will not log any messages and introduce no extra overhead. The method was implemented and integrated into ALEC, which is a source-to-source precompiler for automating application-level checkpointing. The experimental results show that our method is effective.
  • Keywords
    application program interfaces; checkpointing; message passing; program diagnostics; ALC; MPI programs; application-level checkpointing; in-transit messages; logging orphan; static analysis; Automatic logic units; Checkpointing; Distributed processing; High performance computing; Laboratories; Large-scale systems; Performance analysis; Programming profession; Protocols; Scientific computing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing and Communications, 2008. HPCC '08. 10th IEEE International Conference on
  • Conference_Location
    Dalian
  • Print_ISBN
    978-0-7695-3352-0
  • Type

    conf

  • DOI
    10.1109/HPCC.2008.39
  • Filename
    4637745