• DocumentCode
    2049255
  • Title

    Recent advances in checkpoint/recovery systems

  • Author

    Bronevetsky, Greg ; Fernandes, Rohit ; Marques, Daniel ; Pingali, Keshav ; Stodghill, Paul

  • Author_Institution
    Dept. of Comput. Sci., Cornell Univ., Ithaca, NY
  • fYear
    2006
  • fDate
    25-29 April 2006
  • Abstract
    Checkpoint and recovery (CPR) systems have many uses in high-performance computing. Because of this, many developers have implemented it, by hand, into their applications. One of the uses of checkpointing is to help mitigate the effects of interruptions in computational service (both planned and unplanned) In fact, some supercomputing centers expect their users to use checkpointing as a matter of policy. And yet, few centers provide fully automatic checkpointing systems for their high-end production machines. The paper is a status report on our work on the family of C3 systems for (almost) fully automatic checkpointing for scientific applications. To date, we have shown that our techniques can be used for checkpointing sequential, MPI and OpenMP applications written in C, Fortran, and several other languages. A novel aspect of our work is that we have not built a single checkpointing system, rather, we have developed a methodology and a set of techniques that have enabled us to develop a number of systems, each meeting different design goals and efficiency requirements
  • Keywords
    checkpointing; message passing; parallel machines; OpenMP; checkpointing system; computational service; high-end production machine; high-performance computing; message passing interface; recovery system; sequential application; supercomputing center; Application software; Checkpointing; Computer crashes; Computer science; Debugging; Fault tolerance; Hardware; Production systems; Resource management; Visualization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International
  • Conference_Location
    Rhodes Island
  • Print_ISBN
    1-4244-0054-6
  • Type

    conf

  • DOI
    10.1109/IPDPS.2006.1639575
  • Filename
    1639575