DocumentCode :
2426321
Title :
Fault-tolerant parallel applications using queues and actions
Author :
Smith, J. ; Shrivastava, S.
Author_Institution :
Dept. of Comput. Sci., Newcastle upon Tyne Univ., UK
fYear :
1997
fDate :
11-15 Aug 1997
Firstpage :
145
Lastpage :
149
Abstract :
There are many techniques supporting execution of large computations over a network of workstations (NOW) but data intensive computations are usually run on high performance parallel machines. A NOW comprising individual user´s machines typically has a low performance interconnect and suffers arbitrary changes of availability. Exploiting such resources to execute data intensive computations is difficult but even in a more constrained environment there is an unfulfilled need for fault-tolerance. The structuring approach presented fulfills this need. Performance exceeding 100 Mflop/s is demonstrated for large fault-tolerant out of core examples of matrix multiplication and Cholesky factorisation using five 133 MHz Pentium compute machines
Keywords :
fault tolerant computing; matrix multiplication; parallel machines; performance evaluation; workstations; 133 MHz Pentium compute machines; Cholesky factorisation; actions; fault-tolerance; fault-tolerant parallel applications; high performance parallel machines; low performance interconnect; matrix multiplication; network of workstations; queues; Checkpointing; Computer networks; Concurrent computing; Distributed computing; Fault tolerance; File servers; High performance computing; Master-slave; Workstations; Yarn;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Processing, 1997., Proceedings of the 1997 International Conference on
Conference_Location :
Bloomington, IL
ISSN :
0190-3918
Print_ISBN :
0-8186-8108-X
Type :
conf
DOI :
10.1109/ICPP.1997.622578
Filename :
622578
Link To Document :
بازگشت