• DocumentCode
    1991270
  • Title

    Faster checkpointing with N+1 parity

  • Author

    Plank, J.S. ; Kai Li

  • Author_Institution
    Dept. of Comput. Sci., Tennessee Univ., Knoxville, TN, USA
  • fYear
    1994
  • fDate
    15-17 June 1994
  • Firstpage
    288
  • Lastpage
    297
  • Abstract
    This paper presents a way to perform fast incremental checkpointing of multicomputers and distributed systems by using N+1 parity. A basic algorithm is described that uses two extra processors for checkpointing and enables the system to tolerate any single processor failure. The algorithm´s speed comes from a combination of N+1 parity, extra physical memory, and virtual memory hardware so that checkpoints need not be written to disk. This eliminates the most time-consuming portion of checkpointing. The algorithm requires each application processor to allocate a fixed amount of extra memory for checkpointing. This amount may be set statically by the programmer, and need not be equal to the site of the processor´s writable address space. This alleviates a major restriction of previous checkpointing algorithms using N+1 parity. Finally, we outline how to extend our algorithm to tolerate any m processor failures with the addition of 2m extra checkpointing processors.<>
  • Keywords
    distributed processing; fault tolerant computing; reliability; virtual storage; N+1 parity; checkpointing; distributed systems; multicomputers; single processor failure; virtual memory hardware; Checkpointing; Computer science; Debugging; Fault tolerance; Hardware; Magnetic heads; Nonvolatile memory; Programming profession; Read-write memory; Writing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fault-Tolerant Computing, 1994. FTCS-24. Digest of Papers., Twenty-Fourth International Symposium on
  • Conference_Location
    Austin, TX, USA
  • Print_ISBN
    0-8186-5520-8
  • Type

    conf

  • DOI
    10.1109/FTCS.1994.315631
  • Filename
    315631