Title :
Fault-tolerant parallel applications using queues and actions
Author :
Smith, J. ; Shrivastava, S.
Author_Institution :
Dept. of Comput. Sci., Newcastle upon Tyne Univ., UK
Abstract :
There are many techniques supporting execution of large computations over a network of workstations (NOW) but data intensive computations are usually run on high performance parallel machines. A NOW comprising individual user´s machines typically has a low performance interconnect and suffers arbitrary changes of availability. Exploiting such resources to execute data intensive computations is difficult but even in a more constrained environment there is an unfulfilled need for fault-tolerance. The structuring approach presented fulfills this need. Performance exceeding 100 Mflop/s is demonstrated for large fault-tolerant out of core examples of matrix multiplication and Cholesky factorisation using five 133 MHz Pentium compute machines
Keywords :
fault tolerant computing; matrix multiplication; parallel machines; performance evaluation; workstations; 133 MHz Pentium compute machines; Cholesky factorisation; actions; fault-tolerance; fault-tolerant parallel applications; high performance parallel machines; low performance interconnect; matrix multiplication; network of workstations; queues; Checkpointing; Computer networks; Concurrent computing; Distributed computing; Fault tolerance; File servers; High performance computing; Master-slave; Workstations; Yarn;
Conference_Titel :
Parallel Processing, 1997., Proceedings of the 1997 International Conference on
Conference_Location :
Bloomington, IL
Print_ISBN :
0-8186-8108-X
DOI :
10.1109/ICPP.1997.622578