Title :
A timeout-based message ordering protocol for a lightweight software implementation of TMR systems
Author :
Ezhilchelvan, Paul D. ; Brasileiro, Francisco V. ; Speirs, Neil A.
Author_Institution :
Newcastle upon Tyne Univ., UK
Abstract :
Replicated processing with majority voting is a well-known method for achieving reliability and availability. Triple modular redundant (TMR) processing is the most commonly used version of that method. Replicated processing requires that the replicas reach agreement on the order in which input requests are to be processed. Almost all synchronous and deterministic ordering protocols published in the literature are time-based in the sense that they require replicas´ clocks to be kept synchronized within some known bound. We present a protocol for TMR systems that is based on timeouts and does not require clocks to be kept in bounded synchronism. Our design efforts focus on keeping the ordering delays small, without an unnecessary increase in message overhead. Consequently, we are able to show that no symmetric protocol that works only with unsynchronized clocks can provide a smaller worst-case delay. We also demonstrate through analysis and experiments that our protocol is faster than a time-based one of identical message complexity in certain situations which can prevail in many application settings.
Keywords :
fault tolerance; message passing; processor scheduling; redundancy; synchronisation; Byzantine failure; fault tolerance; logical clock; message ordering; physical clock; process replication; synchronization; timeout-based ordering protocol; triple modular redundancy; Availability; Clocks; Computer Society; Delay; Fault tolerance; Nuclear magnetic resonance; Protocols; Redundancy; Synchronization; Voting;
Journal_Title :
Parallel and Distributed Systems, IEEE Transactions on
DOI :
10.1109/TPDS.2004.1264786