• DocumentCode
    2611667
  • Title

    Reducing message logging overhead for log-based recovery

  • Author

    Wang, Yi-Min

  • Author_Institution
    Coordinated Sci. Lab., Illinois,Univ., Urbana, IL, USA
  • fYear
    1993
  • fDate
    3-6 May 1993
  • Firstpage
    1925
  • Abstract
    Checkpointing and rollback recovery is essential for long-running parallel applications. In the case of a transient fault or system crash, the affected application programs can recover from a consistent set of checkpoints saved earlier instead of restarting from the very beginning. For applications requiring transparent fault tolerance, log-based recovery can usually achieve a better recoverable state at the cost of message logging in addition to checkpointing. A simple scheme for reducing message logging overhead based on local dependency information is presented. Communication trace-driven simulation for several parallel applications is used to evaluate the benefits of the proposed scheme for real applications
  • Keywords
    fault tolerant computing; message passing; parallel processing; system recovery; checkpointing; communication trace-driven simulation; local dependency information; log-based recovery; long-running parallel applications; message logging overhead; recoverable state; rollback recovery; system crash; transient fault; transparent fault tolerance; Application software; Checkpointing; Computer crashes; Concurrent computing; Costs; Hardware; Protocols;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Circuits and Systems, 1993., ISCAS '93, 1993 IEEE International Symposium on
  • Conference_Location
    Chicago, IL
  • Print_ISBN
    0-7803-1281-3
  • Type

    conf

  • DOI
    10.1109/ISCAS.1993.394126
  • Filename
    394126