DocumentCode :
2787823
Title :
Fast Failure Detection in a Process Group
Author :
Li, Xinjie ; Brockmeyer, Monica
Author_Institution :
Dept. of Comput. Sci., Wayne State Univ., Detroit, MI
fYear :
2007
fDate :
26-30 March 2007
Firstpage :
1
Lastpage :
10
Abstract :
Failure detectors represent a very important building block in distributed applications. The speed and the accuracy of the failure detectors is critical to the performance of the applications built on them. In a common implementation of failure detector based on heartbeats, there is a tradeoff between speed and accuracy so it is difficult to be both fast and accurate. Based on the observation that in many distributed applications, one process takes a special role as the leader, we propose a fast failure detection (FFD) algorithm that detects the failure of the leader both fast and accurately. Taking advantage of spatial multiple timeouts, FFD detects the failure of the leader within a time period of just a little more than one heartbeat interval, making it almost the fastest detection algorithm possible based on heartbeat messages. FFD could be used stand alone in a static configuration where the leader process is fixed at one site. In a dynamic setting, where the role of leader has to be assumed by another site if the current leader fails, FFD could be used in collaboration with a leader election algorithm to speed up the process of electing a new leader.
Keywords :
distributed algorithms; fault diagnosis; distributed applications; fast failure detection algorithm; heartbeat messages; leader election algorithm; process group; spatial multiple timeouts; static configuration; Application software; Collaborative work; Computer crashes; Computer science; Condition monitoring; Delay effects; Detection algorithms; Detectors; Nominations and elections; Quality of service;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Processing Symposium, 2007. IPDPS 2007. IEEE International
Conference_Location :
Long Beach, CA
Print_ISBN :
1-4244-0910-1
Electronic_ISBN :
1-4244-0910-1
Type :
conf
DOI :
10.1109/IPDPS.2007.370296
Filename :
4228024
Link To Document :
بازگشت