• DocumentCode
    3501424
  • Title

    FAIL-MPI: How Fault-Tolerant Is Fault-Tolerant MPI?

  • Author

    Hoarau, William ; Lemarinier, Pierre ; Herault, Thomas ; Rodriguez, Eric ; Tixeuil, Sébastien ; Cappello, Franck

  • Author_Institution
    Univ. Paris Sud-XI, LRI, Orsay
  • fYear
    2006
  • fDate
    25-28 Sept. 2006
  • Firstpage
    1
  • Lastpage
    10
  • Abstract
    One of the topics of paramount importance in the development of cluster and grid middleware is the impact of faults since their occurrence in grid infrastructures and in large-scale distributed systems is common. MPI (message passing interface) is a popular abstraction for programming distributed and parallel applications. FAIL (FAult Injection Language) is an abstract language for fault occurrence description capable of expressing complex and realistic fault scenarios. In this paper, we investigate the possibility of using FAIL to inject faults in a fault-tolerant MPI implementation. Our middleware, FAIL-MPI, is used to carry quantitative and qualitative faults and stress testing
  • Keywords
    fault tolerant computing; message passing; middleware; FAIL-MPI; FAult Injection Language; abstract language; distributed programming; fault occurrence description; fault-tolerant MPI; message passing interface; middleware; parallel programming; qualitative faults; quantitative faults; stress testing; Application software; Fault tolerance; Fault tolerant systems; Large-scale systems; Message passing; Middleware; Protocols; Stress; System testing; Vehicle crash testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster Computing, 2006 IEEE International Conference on
  • Conference_Location
    Barcelona
  • ISSN
    1552-5244
  • Print_ISBN
    1-4244-0327-8
  • Electronic_ISBN
    1552-5244
  • Type

    conf

  • DOI
    10.1109/CLUSTR.2006.311851
  • Filename
    4100357