• DocumentCode
    3013421
  • Title

    Egida: an extensible toolkit for low-overhead fault-tolerance

  • Author

    Rao, Sriram ; Alvisi, Lorenzo ; Vin, Harrick M.

  • Author_Institution
    Dept. of Comput. Sci., Texas Univ., Austin, TX, USA
  • fYear
    1999
  • fDate
    15-18 June 1999
  • Firstpage
    48
  • Lastpage
    55
  • Abstract
    We discuss the design and implementation of Egida, an object-oriented toolkit designed to support transparent rollback-recovery. Egida exports a simple specification language that can be used to express arbitrary rollback recovery protocols. From this specification, Egida automatically synthesizes an implementation of the specified protocol by gluing together the appropriate objects from an available library of "building blocks". Egida is extensible and facilitates rapid implementation of rollback recovery protocols with minimal programming effort. We have integrated Egida with the MPICH implementation of the MPI standard. Existing MPI applications can rake advantage of Egida without any modifications: fault-tolerance is achieved transparently-all that is needed is a simple re-link of the MPI application with Egida.
  • Keywords
    fault tolerant computing; object-oriented programming; software tools; system recovery; Egida; extensible toolkit; low-overhead fault-tolerance; object-oriented toolkit; rollback recovery protocols; specification language; transparent rollback-recovery; Checkpointing; Electrical capacitance tomography; Fault tolerance; Libraries; Mission critical systems; NASA; Protocols; Specification languages; Sun; Supercomputers;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fault-Tolerant Computing, 1999. Digest of Papers. Twenty-Ninth Annual International Symposium on
  • Conference_Location
    Madison, WI, USA
  • ISSN
    0731-3071
  • Print_ISBN
    0-7695-0213-X
  • Type

    conf

  • DOI
    10.1109/FTCS.1999.781033
  • Filename
    781033