• DocumentCode
    167520
  • Title

    Coordination Languages and MPI Perturbation Theory: The FOX Tuple Space Framework for Resilience

  • Author

    Wilke, Jeremiah J.

  • Author_Institution
    Scalable Modeling & Anal., Sandia Nat. Labs., Livermore, CA, USA
  • fYear
    2014
  • fDate
    19-23 May 2014
  • Firstpage
    1208
  • Lastpage
    1217
  • Abstract
    Coordination languages are an established programming model for distributed computing, but have been largely eclipsed by message passing (MPI) in scientific computing. In contrast to MPI, parallel workers never directly communicate, instead "coordinating" indirectly via key-value store puts and gets. Coordination often focuses on program expressiveness, making parallel codes easier to implement. However, coordination also benefits resilience since the key-value store acts as a virtualization layer. Coordination languages (notably Linda) were therefore leading candidates for fault-tolerance in the early \´90s. We present the FOX tuple space framework, an extension of Linda ideas focused primarily on transitioning MPI codes to coordination programming. We demonstrate the notion of "MPI Perturbation Theory," showing how MPI codes can be naturally generalized to the tuple-space framework. We also consider details of high-performance interconnects, showing how intelligent use of RDMA hardware allows virtualization with minimal added latency. The framework is shown to be resilient to degradation of individual nodes, automatically rebalancing for minimal performance loss. Future fault-tolerant extensions are discussed.
  • Keywords
    application program interfaces; fault tolerant computing; message passing; programming languages; virtualisation; FOX tuple space framework; MPI codes; MPI perturbation theory; RDMA hardware; coordination languages; coordination programming model; distributed computing; fault tolerance; fault tolerant extensions; high performance interconnects; message passing; parallel codes; program expressiveness; resilience; scientific computing; virtualization layer; Arrays; Computational modeling; Fault tolerance; Fault tolerant systems; Programming; Resilience; Runtime; asynchronous execution models; fault-tolerance; many-task models; resilience; work stealing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International
  • Conference_Location
    Phoenix, AZ
  • Print_ISBN
    978-1-4799-4117-9
  • Type

    conf

  • DOI
    10.1109/IPDPSW.2014.136
  • Filename
    6969518