• DocumentCode
    564977
  • Title

    Non-blocking roll-forward recovery for message passing systems

  • Author

    Chitsaz, Behzad ; Razzazi, Mohammadreza

  • Author_Institution
    Dept. of Comput. Eng. & Inf. Technol., Amirkabir Univ. of Technol., Tehran, Iran
  • fYear
    2012
  • fDate
    21-25 May 2012
  • Firstpage
    339
  • Lastpage
    344
  • Abstract
    Due to the message transmission between processes in a distributed system, an error in a process might be propagated to another via faulty messages, which causes a global failure. In the absence of built-in fault detection methods, rollback recovery approach is not useful. To avoid error propagation and rollback overhead, roll-forward recovery schemes based on redundancy techniques such as N-Version Programming techniques have been presented. The disadvantage of using these schemes is that they need to block the receiver process until each received message is confirmed by the other version of the process, which results in high time overhead. In the case of variant response latencies, consisting of processing time and message transmission delay, these techniques would not be efficient. In this paper, a non-blocking roll-forward recovery approach with some changes to duplex system is proposed. This approach does not avoid fault propagation. But it performs an additional test using a copy of a failed module version to discover faulty process and replace its state with the fault-free process and mask the faults which are propagated to other processes; so it does not need to block processing or message transmission in any phases of the process. This scheme has lower execution time than existing roll-forward techniques.
  • Keywords
    message passing; software fault tolerance; system recovery; built-in fault detection methods; distributed system; duplex system; error propagation avoidance; fault-free process; faulty messages; faulty process discovery; message passing systems; message transmission; message transmission delay; n-version programming techniques; nonblocking roll-forward recovery schemes; processing time; rollback overhead avoidance; variant response latencies; Delay; Fault detection; Fault tolerant systems; Message passing; Redundancy; Software; Causal Memory; Replicated Distributed Systems; Replication Consistency;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    MIPRO, 2012 Proceedings of the 35th International Convention
  • Conference_Location
    Opatija
  • Print_ISBN
    978-1-4673-2577-6
  • Type

    conf

  • Filename
    6240667