Title :
Performance of fault tolerant networks of workstations
Author_Institution :
Dept. of Electr. & Electron. Eng., Western Australia Univ., Nedlands, WA, Australia
Abstract :
Functional or dataflow models of computation enable a program´s run-time system to determine which portions of a computation must be repeated when faults occur. Straightforward modifications to the run-time system of Cilk 2.0 (a threaded extension of C) enable a network-of-workstations parallel processing system to tolerate fail-stop faults of the individual processors or the network. It is shown in this paper that the overheads needed to provide this fault tolerance are mainly CPU cycles and memory, with very little additional network load being generated in the absence of faults. This makes it feasible to run long computations successfully on the typical networks of workstations found in large organisations, where ownership, control and distribution of the individual processors may be widely distributed
Keywords :
C language; data flow computing; fault tolerant computing; multi-threading; parallel architectures; performance evaluation; workstation clusters; C language; CPU cycle overhead; Cilk 2.0; dataflow computation model; fail-stop faults; fault-tolerant workstation network performance; functional computation model; large organisations; long computations; memory overhead; network load; parallel processing system; processor control; processor distribution; processor ownership; repeated computation; run-time system; threaded extension; Computational modeling; Computer languages; Costs; Distributed computing; Fault tolerance; Fault tolerant systems; Hip; Information processing; Intelligent systems; Workstations;
Conference_Titel :
Parallel Architectures, Algorithms, and Networks, 1999. (I-SPAN '99) Proceedings. Fourth InternationalSymposium on
Conference_Location :
Perth/Fremantle, WA
Print_ISBN :
0-7695-0231-8
DOI :
10.1109/ISPAN.1999.778928