Title :
Fault tolerance in the WebCom metacomputer
Author :
Morrison, John P. ; Kennedy, James J. ; Power, David A.
Author_Institution :
Nat. Univ. of Ireland, Cork, Ireland
Abstract :
This paper addresses fault tolerance in the WebCom metacomputer. WebCom´s computation platform is dynamically reconfigurable and volunteer-based. Since its constituent machines may join and leave unpredictability, fault survival and efficient fault recovery is of paramount importance. A fault tolerance mechanism is outlined, which relies on a fast and efficient processor replacement procedure. It is shown that the characteristics of this procedure, together with the hierarchical and referentially transparent nature of WebCom executions, can be used to limit the effect of a fault to its immediate neighbourhood
Keywords :
Internet; distributed memory systems; distributed processing; fault tolerant computing; WebCom metacomputer; computation platform; fault recovery; fault survival; fault tolerance; processor replacement procedure; Character generation; Computer science; Costs; Distributed computing; Fault tolerance; Hardware; Internet; Redundancy; Safety; Wire;
Conference_Titel :
Parallel Processing Workshops, 2001. International Conference on
Conference_Location :
Valencia
Print_ISBN :
0-7695-1260-7
DOI :
10.1109/ICPPW.2001.951958