DocumentCode :
564977
Title :
Non-blocking roll-forward recovery for message passing systems
Author :
Chitsaz, Behzad ; Razzazi, Mohammadreza
Author_Institution :
Dept. of Comput. Eng. & Inf. Technol., Amirkabir Univ. of Technol., Tehran, Iran
fYear :
2012
fDate :
21-25 May 2012
Firstpage :
339
Lastpage :
344
Abstract :
Due to the message transmission between processes in a distributed system, an error in a process might be propagated to another via faulty messages, which causes a global failure. In the absence of built-in fault detection methods, rollback recovery approach is not useful. To avoid error propagation and rollback overhead, roll-forward recovery schemes based on redundancy techniques such as N-Version Programming techniques have been presented. The disadvantage of using these schemes is that they need to block the receiver process until each received message is confirmed by the other version of the process, which results in high time overhead. In the case of variant response latencies, consisting of processing time and message transmission delay, these techniques would not be efficient. In this paper, a non-blocking roll-forward recovery approach with some changes to duplex system is proposed. This approach does not avoid fault propagation. But it performs an additional test using a copy of a failed module version to discover faulty process and replace its state with the fault-free process and mask the faults which are propagated to other processes; so it does not need to block processing or message transmission in any phases of the process. This scheme has lower execution time than existing roll-forward techniques.
Keywords :
message passing; software fault tolerance; system recovery; built-in fault detection methods; distributed system; duplex system; error propagation avoidance; fault-free process; faulty messages; faulty process discovery; message passing systems; message transmission; message transmission delay; n-version programming techniques; nonblocking roll-forward recovery schemes; processing time; rollback overhead avoidance; variant response latencies; Delay; Fault detection; Fault tolerant systems; Message passing; Redundancy; Software; Causal Memory; Replicated Distributed Systems; Replication Consistency;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
MIPRO, 2012 Proceedings of the 35th International Convention
Conference_Location :
Opatija
Print_ISBN :
978-1-4673-2577-6
Type :
conf
Filename :
6240667
Link To Document :
بازگشت