• DocumentCode
    2257155
  • Title

    A communication-induced checkpointing algorithm using virtual checkpoint on distributed systems

  • Author

    Do-Hyung, Kim ; Chang-Soon, Park

  • Author_Institution
    Electron. & Telecommun. Res. Inst., South Korea
  • fYear
    2000
  • fDate
    2000
  • Firstpage
    145
  • Lastpage
    150
  • Abstract
    Checkpointing is a fault-tolerant technique for restoring faults and restarting jobs quickly. The algorithms for checkpointing on distributed systems have been under study for years. These algorithms can be classified into three types: coordinated, uncoordinated and communication-induced algorithms. In this paper we propose a new communication-induced checkpointing algorithm that has a minimum checkpointing count equivalent to the periodic checkpointing algorithm, and relatively short rollback distance at fault situations. The proposed algorithm is compared with the previously proposed communication-induced checkpointing algorithms with simulation results. In the simulation, the proposed algorithm produces better performance than other algorithms in terms of task completion time in both fault-free and fault situations
  • Keywords
    distributed processing; fault tolerant computing; system recovery; virtual machines; communication-induced checkpointing algorithm; distributed systems; rollback distance; simulation; task completion time; virtual checkpoint; Checkpointing; Communication system control; Degradation; Fault tolerant systems; Force control; Hardware; Terminology;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Systems, 2000. Proceedings. Seventh International Conference on
  • Conference_Location
    Iwate
  • ISSN
    1521-9097
  • Print_ISBN
    0-7695-0568-6
  • Type

    conf

  • DOI
    10.1109/ICPADS.2000.857693
  • Filename
    857693