DocumentCode :
3501424
Title :
FAIL-MPI: How Fault-Tolerant Is Fault-Tolerant MPI?
Author :
Hoarau, William ; Lemarinier, Pierre ; Herault, Thomas ; Rodriguez, Eric ; Tixeuil, Sébastien ; Cappello, Franck
Author_Institution :
Univ. Paris Sud-XI, LRI, Orsay
fYear :
2006
fDate :
25-28 Sept. 2006
Firstpage :
1
Lastpage :
10
Abstract :
One of the topics of paramount importance in the development of cluster and grid middleware is the impact of faults since their occurrence in grid infrastructures and in large-scale distributed systems is common. MPI (message passing interface) is a popular abstraction for programming distributed and parallel applications. FAIL (FAult Injection Language) is an abstract language for fault occurrence description capable of expressing complex and realistic fault scenarios. In this paper, we investigate the possibility of using FAIL to inject faults in a fault-tolerant MPI implementation. Our middleware, FAIL-MPI, is used to carry quantitative and qualitative faults and stress testing
Keywords :
fault tolerant computing; message passing; middleware; FAIL-MPI; FAult Injection Language; abstract language; distributed programming; fault occurrence description; fault-tolerant MPI; message passing interface; middleware; parallel programming; qualitative faults; quantitative faults; stress testing; Application software; Fault tolerance; Fault tolerant systems; Large-scale systems; Message passing; Middleware; Protocols; Stress; System testing; Vehicle crash testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cluster Computing, 2006 IEEE International Conference on
Conference_Location :
Barcelona
ISSN :
1552-5244
Print_ISBN :
1-4244-0327-8
Electronic_ISBN :
1552-5244
Type :
conf
DOI :
10.1109/CLUSTR.2006.311851
Filename :
4100357
Link To Document :
بازگشت