DocumentCode :
3018413
Title :
Combining FT-MPI with H2O: Fault-Tolerant MPI Across Administrative Boundaries
Author :
Kurzyniec, Dawid ; Sunderam, Vaidy
Author_Institution :
Dept. of Math. & Comput. Sci., Emory Univ., Atlanta, GA, USA
fYear :
2005
fDate :
04-08 April 2005
Abstract :
We observe increasing interest in aggregating geographically distributed, heterogeneous resources to perform large scale computations. MPI remains the most popular programming paradigm for such applications; however, as the size of computing environments increases, fault tolerance aspects become critically important. We argue that the fault tolerance model proposed by FT-MPI fits well in geographically distributed environments, even though its current implementation is confined to a single administrative domain. We propose to overcome these limitations by combining FTMPI with the H2O resource sharing framework. Our approach allows users to run fault tolerant MPI programs on heterogeneous, geographically distributed shared machines, without sacrificing performance and with minimal involvement of resource providers.
Keywords :
application program interfaces; message passing; resource allocation; software fault tolerance; MPI program; application program interfaces; distributed shared machines; fault-tolerant system; resource sharing framework; Application software; Computer architecture; Computer science; Current measurement; Distributed computing; Fault tolerance; Large-scale systems; Middleware; Resource management; Water;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Processing Symposium, 2005. Proceedings. 19th IEEE International
Print_ISBN :
0-7695-2312-9
Type :
conf
DOI :
10.1109/IPDPS.2005.141
Filename :
1419948
Link To Document :
بازگشت