Author_Institution :
IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
Abstract :
Describes a new technique, based on exchanging control signals between neighboring nodes, for constructing a stable and fault-tolerant global clock in a distributed system with an arbitrary topology. It is shown that it is possible to construct a global clock reference with a time step that is much smaller than the propagation delay over the network´s links. The synchronization algorithm ensures that the global clock “tick” has a stable periodicity, and therefore, it is possible to tolerate failures of links and clocks that operate faster and/or slower than nominally specified, as well as hard failures. The approach taken is to generate a global clock from the ensemble of the local transmission clocks and not to directly synchronize these high-speed clocks. The steady-state algorithm, which generates the global clock, is executed in hardware by the network interface of each node. At the network interface, it is possible to measure accurately the propagation delay between neighboring nodes with a small error or uncertainty and thereby to achieve global synchronization that is proportional to these error measurements. It is shown that the local clock drift (or rate uncertainty) has only a secondary effect on the maximum global clock rate. The synchronization algorithm can tolerate any physical failure. It will continue to operate correctly on any connected segment of the network, i.e., it can tolerate any number of link and node failures, as long as the network remains connected
Keywords :
clocks; fault tolerant computing; local area networks; reliability; synchronisation; telecommunications control; MetaNet architecture; distributed system; error measurements; fault-tolerant global clock; global clock reference; global synchronization; hard failures; high-speed control signals; link failures; local transmission clocks; network interface; node failures; propagation delay; rate uncertainty; stable periodicity; steady-state algorithm; synchronization algorithm; time step; topology; Clocks; Control systems; Fault tolerance; Fault tolerant systems; Hardware; Network interfaces; Network topology; Propagation delay; Steady-state; Synchronization;