DocumentCode
3501424
Title
FAIL-MPI: How Fault-Tolerant Is Fault-Tolerant MPI?
Author
Hoarau, William ; Lemarinier, Pierre ; Herault, Thomas ; Rodriguez, Eric ; Tixeuil, Sébastien ; Cappello, Franck
Author_Institution
Univ. Paris Sud-XI, LRI, Orsay
fYear
2006
fDate
25-28 Sept. 2006
Firstpage
1
Lastpage
10
Abstract
One of the topics of paramount importance in the development of cluster and grid middleware is the impact of faults since their occurrence in grid infrastructures and in large-scale distributed systems is common. MPI (message passing interface) is a popular abstraction for programming distributed and parallel applications. FAIL (FAult Injection Language) is an abstract language for fault occurrence description capable of expressing complex and realistic fault scenarios. In this paper, we investigate the possibility of using FAIL to inject faults in a fault-tolerant MPI implementation. Our middleware, FAIL-MPI, is used to carry quantitative and qualitative faults and stress testing
Keywords
fault tolerant computing; message passing; middleware; FAIL-MPI; FAult Injection Language; abstract language; distributed programming; fault occurrence description; fault-tolerant MPI; message passing interface; middleware; parallel programming; qualitative faults; quantitative faults; stress testing; Application software; Fault tolerance; Fault tolerant systems; Large-scale systems; Message passing; Middleware; Protocols; Stress; System testing; Vehicle crash testing;
fLanguage
English
Publisher
ieee
Conference_Titel
Cluster Computing, 2006 IEEE International Conference on
Conference_Location
Barcelona
ISSN
1552-5244
Print_ISBN
1-4244-0327-8
Electronic_ISBN
1552-5244
Type
conf
DOI
10.1109/CLUSTR.2006.311851
Filename
4100357
Link To Document