• DocumentCode
    1446326
  • Title

    Diskless checkpointing

  • Author

    Plank, James S. ; Li, Kai ; Puening, Michael A.

  • Author_Institution
    Dept. of Comput. Sci., Tennessee Univ., Knoxville, TN, USA
  • Volume
    9
  • Issue
    10
  • fYear
    1998
  • fDate
    10/1/1998 12:00:00 AM
  • Firstpage
    972
  • Lastpage
    986
  • Abstract
    Diskless Checkpointing is a technique for checkpointing the state of a long-running computation on a distributed system without relying on stable storage. As such, it eliminates the performance bottleneck of traditional checkpointing on distributed systems. In this paper, we motivate diskless checkpointing and present the basic diskless checkpointing scheme along with several variants for improved performance. The performance of the basic scheme and its variants is evaluated on a high-performance network of workstations and compared to traditional disk-based checkpointing. We conclude that diskless checkpointing is a desirable alternative to disk-based checkpointing that can improve the performance of distributed applications in the face of failures
  • Keywords
    distributed processing; fault tolerant computing; system recovery; disk-based checkpointing; diskless checkpointing; distributed applications; distributed system; high-performance network; long-running computation; performance bottleneck; Checkpointing; Computer Society; Distributed computing; Error correction codes; Fault tolerance; Fault tolerant systems; Hardware; Programming environments; Redundancy; Workstations;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/71.730527
  • Filename
    730527