• DocumentCode
    3175525
  • Title

    Probabilistic checkpointing

  • Author

    Hyo-Chang Nam ; Jong Kim ; SungJe Hong ; Sunggu Lee

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Pohang Univ. of Sci. & Technol., South Korea
  • fYear
    1997
  • fDate
    24-27 June 1997
  • Firstpage
    48
  • Lastpage
    57
  • Abstract
    Many optimization schemes have been proposed to reduce the overhead of checkpointing. Incremental checkpointing based on memory page protection has been one of the successful schemes used to reduce the overhead and to improve the performance of checkpointing. In this paper, we propose two checkpointing schemes, called "block encoding" and "combined block encoding", which further reduce the checkpointing overhead. The smallest unit of checkpoint data in our scheme is a block, which is smaller than a page-this reduces the amount of checkpoint data required when compared with page-based incremental checkpointing. One drawback of the proposed schemes is the possibility of aliasing in encoded words. In this paper, however, we show that the aliasing probability is near zero when an 8-byte encoded word is used. The performance of the proposed schemes is analyzed and measured using experiments. First, we construct an analytic model that predicts the checkpointing overhead. By using this model, we can estimate the block size that produces the best performance for a given target program. Next, the proposed schemes are implemented on libckpt, a general-purpose checkpointing library for Unit based system which was developed at the University of Tennessee. According to our experimental results, the proposed schemes reduce the overhead by 11.7% in the best case and increase the overhead by 0.5% in the worst case in comparison with page-based incremental checkpointing. In most cases, the combined block encoding scheme shows an improvement over both block encoding and page-based incremental checkpointing.
  • Keywords
    encoding; fault tolerant computing; optimisation; performance evaluation; system recovery; 8-byte encoded word; aliasing probability; analytic model; block encoding; block size; combined block encoding; general-purpose checkpointing library; incremental checkpointing; memory page protection; optimization schemes; performance; probabilistic checkpointing; Checkpointing; Computer science; Costs; Delay; Encoding; Fault tolerant systems; Libraries; Multiprocessing systems; Performance analysis; Protection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fault-Tolerant Computing, 1997. FTCS-27. Digest of Papers., Twenty-Seventh Annual International Symposium on
  • Conference_Location
    Seattle, WA, USA
  • ISSN
    0731-3071
  • Print_ISBN
    0-8186-7831-3
  • Type

    conf

  • DOI
    10.1109/FTCS.1997.614077
  • Filename
    614077