• DocumentCode
    3443928
  • Title

    An efficient coordinated checkpointing scheme for multicomputers

  • Author

    Sharma, Debendra Das ; Pradhan, Dhiraj K.

  • Author_Institution
    Hewlett-Packard Co., Roseville, CA, USA
  • fYear
    1994
  • fDate
    12-14 Jun 1994
  • Firstpage
    36
  • Lastpage
    42
  • Abstract
    A new approach for checkpointing multicomputer applications is presented. The checkpointing is initiated and controlled by a checkpoint coordinator, residing either on one of the nodes running the application or on the host processor attached to the multicomputer. A message count is used to determine if any messages are in transit. The proposed strategy is hardware-independent and can be implemented in any multicomputer system irrespective of the architecture, interconnection, and routing strategy. This scheme can be used for FIFO and non-FIFO channels as well as with channels where messages can be lost. Measurement results obtained from our simulations indicate that the proposed strategy outperforms an existing scheme proposed for fixed-path wormhole-routed multicomputer systems. Although the proposed strategy is targeted for high-performance, massively parallel multicomputers, it can also be used in any general-purpose distributed system to improve the checkpointing overhead
  • Keywords
    distributed processing; fault tolerant computing; parallel architectures; checkpoint coordinator; coordinated checkpointing scheme; fixed-path wormhole-routed multicomputer systems; general-purpose distributed system; host processor; message count; multicomputers; Application software; Checkpointing; Computer crashes; Computer science; Concurrent computing; Distributed computing; Fault tolerant systems; Processor scheduling; Routing; Supercomputers;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fault-Tolerant Parallel and Distributed Systems, 1994., Proceedings of IEEE Workshop on
  • Conference_Location
    College Station, TX
  • Print_ISBN
    0-8186-6807-5
  • Type

    conf

  • DOI
    10.1109/FTPDS.1994.494472
  • Filename
    494472