• DocumentCode
    3556947
  • Title

    Crash recovery with little overhead

  • Author

    Juang, Tony T -Y ; Venkatesan, S.

  • Author_Institution
    Comput. Sci. Program, Texas Univ at Dallas, Richardson, TX, USA
  • fYear
    1991
  • fDate
    20-24 May 1991
  • Firstpage
    454
  • Lastpage
    461
  • Abstract
    Recovering from processor failures in distributed systems is an important problem in the design and development of reliable systems. Two solutions to this problem which involve very little overhead are presented. Without appending any information to the messages of the application program, it is shown that it is possible to recover from failures using O(|V| |E|) messages where |V| is the number of processors and |E| is the number of communication links in the system. The second algorithm can be used to recover from processor failures without forcing nonfaulty processors to roll back under certain conditions
  • Keywords
    fault tolerant computing; file organisation; operating systems (computers); system recovery; application program; communication links; crash recovery; distributed systems; nonfaulty processors; processor failures; reliable systems; Checkpointing; Computer crashes; Computer science; Delay; Fault tolerant systems; Hardware; History; IEL; Protocols;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Distributed Computing Systems, 1991., 11th International Conference on
  • Conference_Location
    Arlington, TX
  • Print_ISBN
    0-8186-2144-3
  • Type

    conf

  • DOI
    10.1109/ICDCS.1991.148709
  • Filename
    148709