DocumentCode :
3508743
Title :
Fault tolerance in the WebCom metacomputer
Author :
Morrison, John P. ; Kennedy, James J. ; Power, David A.
Author_Institution :
Nat. Univ. of Ireland, Cork, Ireland
fYear :
2001
fDate :
2001
Firstpage :
245
Lastpage :
250
Abstract :
This paper addresses fault tolerance in the WebCom metacomputer. WebCom´s computation platform is dynamically reconfigurable and volunteer-based. Since its constituent machines may join and leave unpredictability, fault survival and efficient fault recovery is of paramount importance. A fault tolerance mechanism is outlined, which relies on a fast and efficient processor replacement procedure. It is shown that the characteristics of this procedure, together with the hierarchical and referentially transparent nature of WebCom executions, can be used to limit the effect of a fault to its immediate neighbourhood
Keywords :
Internet; distributed memory systems; distributed processing; fault tolerant computing; WebCom metacomputer; computation platform; fault recovery; fault survival; fault tolerance; processor replacement procedure; Character generation; Computer science; Costs; Distributed computing; Fault tolerance; Hardware; Internet; Redundancy; Safety; Wire;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Processing Workshops, 2001. International Conference on
Conference_Location :
Valencia
ISSN :
1530-2016
Print_ISBN :
0-7695-1260-7
Type :
conf
DOI :
10.1109/ICPPW.2001.951958
Filename :
951958
Link To Document :
بازگشت