DocumentCode :
1666574
Title :
Using Golomb rulers for optimal recovery schemes in fault tolerant distributed computing
Author :
Klonowska, Kamilla ; Lundberg, Lars ; Lennerstad, Håkan
Author_Institution :
Dept. of Software Eng. & Comput. Sci., Blekinge Inst. of Technol., Ronneby, Sweden
fYear :
2003
Abstract :
Clusters and distributed systems offer fault tolerance and high performance through load sharing. When all computers are up and running, we would like the load to be evenly distributed among the computers. When one or more computers break down the load on these computers must be redistributed to other computers in the cluster. The redistribution is determined by the recovery scheme. The recovery scheme should keep the load as evenly distributed as possible even when the most unfavorable combinations of computers break down, i.e. we want to optimize the worst-case behavior. In this paper we define recovery schemes, which are optimal for a number of important cases. We also show that the problem of finding optimal recovery schemes corresponds to the mathematical problem called Golomb rulers. These provide optimal recovery schemes for up to 373 computers in the cluster.
Keywords :
distributed processing; fault tolerant computing; system recovery; workstation clusters; Golomb rulers; clusters; fault tolerant distributed computing; optimal recovery schemes; worst-case behavior; Application software; Availability; Computer crashes; Computer errors; Computer science; Distributed computing; Fault tolerance; Fault tolerant systems; Software engineering; Sun;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Processing Symposium, 2003. Proceedings. International
ISSN :
1530-2075
Print_ISBN :
0-7695-1926-1
Type :
conf
DOI :
10.1109/IPDPS.2003.1213390
Filename :
1213390
Link To Document :
بازگشت