• DocumentCode
    2897808
  • Title

    A fast restart mechanism for checkpoint/recovery protocols in networked environments

  • Author

    Li, Yawei ; Lan, Zhiling

  • Author_Institution
    Dept. of Comput. Sci., Illinois Inst. of Technol., Chicago, IL
  • fYear
    2008
  • fDate
    24-27 June 2008
  • Firstpage
    217
  • Lastpage
    226
  • Abstract
    Checkpoint/recovery has been studied extensively, and various optimization techniques have been presented for its improvement. Regardless of the considerable research efforts, little work has been done on improving its restart latency. The time spent on retrieving and loading the checkpoint image during a recovery is non-trivial, especially in networked environments. With the ever-increasing application memory footprint and system failure rate, it is becoming more of an issue. In this paper, we present a fast restart mechanism called FREM. It allows fast restart of a failed process without requiring the availability of the entire checkpoint image. By dynamically tracking the process data accesses after each checkpoint, FREM masks restart latency by overlapping the computation of the resumed process with the retrieval of its checkpoint image. We have implemented FREM with the BLCR checkpointing tool in Linux systems. Our experiments with the SPEC benchmarks indicate that it can effectively reduce restart latency by 61.96% on average in networked environments.
  • Keywords
    checkpointing; protocols; software tools; Linux systems; checkpoint image; checkpoint-recovery protocols; fast restart mechanism; optimization techniques; restart latency; Access protocols; Checkpointing; Computer networks; Delay; Fault tolerant systems; High performance computing; Image retrieval; Information retrieval; Linux; Runtime;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Dependable Systems and Networks With FTCS and DCC, 2008. DSN 2008. IEEE International Conference on
  • Conference_Location
    Anchorage, AK
  • Print_ISBN
    978-1-4244-2397-2
  • Electronic_ISBN
    978-1-4244-2398-9
  • Type

    conf

  • DOI
    10.1109/DSN.2008.4630090
  • Filename
    4630090