• DocumentCode
    2234584
  • Title

    Process-replication technique for fault-tolerance and performance improvement in distributed computing systems

  • Author

    Chiu, Jane-Ferng ; Chiu, Ge-Ming

  • Author_Institution
    Dept. of Electron. Eng. & Technol., Nat. Taiwan Inst. of Technol., Taipei, Taiwan
  • fYear
    1994
  • fDate
    2-5 Aug 1994
  • Firstpage
    236
  • Lastpage
    243
  • Abstract
    The paper presents a process-replication protocol which aims at providing fault-tolerance as well as performance improvement to applications such as long-running and real-time tasks. Identical delivering order of messages are enforced on all replicas of a troupe using multicasts for inter- and intra-troupe communication. The detailed design of the protocol is given in the paper. The protocol is self-contained in the sense that crashes in a troupe are handled internally without affecting the operation of other troupes. The crash-handling procedure is simple and associated overhead during fail-free operation is small. The protocol takes advantages of the redundancy of processes to expedite the completion of a distributed task by speeding up the determination of message sequences and transmission of outgoing data messages at the expense of small control messages. Simulation is carried out to show the performance improvement
  • Keywords
    distributed processing; fault tolerant computing; message passing; performance evaluation; protocols; software reliability; crash-handling procedure; data messages; distributed computing systems; fail-free operation; fault-tolerance; intertroupe communication; intratroupe communication; message sequences; multicasts; overhead; performance improvement; process-replication technique; protocol; real-time tasks; redundancy; simulation; small control messages; Computational modeling; Computer crashes; Content addressable storage; Distributed computing; Fault tolerance; Fault tolerant systems; History; Multicast protocols; Redundancy; Resumes;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Distributed Computing, 1994., Proceedings of the Third IEEE International Symposium on
  • Conference_Location
    San Francisco, CA
  • Print_ISBN
    0-8186-6395-2
  • Type

    conf

  • DOI
    10.1109/HPDC.1994.340239
  • Filename
    340239