• DocumentCode
    3144559
  • Title

    A Framework for Elastic Execution of Existing MPI Programs

  • Author

    Raveendran, Aarthi ; Bicer, Tekin ; Agrawal, Gagan

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
  • fYear
    2011
  • fDate
    16-20 May 2011
  • Firstpage
    940
  • Lastpage
    947
  • Abstract
    There is a clear trend towards using cloud resources in the scientific or the HPC community, with a key attraction of cloud being the elasticity it offers. In executing HPC applications on a cloud environment, it will clearly be desirable to exploit elasticity of cloud environments, and increase or decrease the number of instances an application is executed on during the execution of the application, to meet time and/or cost constraints. Unfortunately, HPC applications have almost always been designed to use a fixed number of resources. This paper describes our initial work towards the goal of making existing MPI applications elastic for a cloud framework. Considering the limitations of the MPI implementations currently available, we support adaptation by terminating one execution and restarting a new program on a different number of instances. The components of our envisioned system include a decision layer which considers time and cost constraints, a framework for modifying MPI programs, and a cloud-based runtime support that can enable redistributing of saved data, and support automated resource allocation and application restart on a different number of nodes. Using two MPI applications, we demonstrate the feasibility of our approach, and show that outputting, redistributing, and reading back data can be a reasonable approach for making existing MPI applications elastic.
  • Keywords
    cloud computing; message passing; HPC applications; MPI programs; cloud resources; cost constraint; resource allocation; time constraint; Arrays; Cloud computing; Elasticity; Monitoring; Resource management; Runtime;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on
  • Conference_Location
    Shanghai
  • ISSN
    1530-2075
  • Print_ISBN
    978-1-61284-425-1
  • Electronic_ISBN
    1530-2075
  • Type

    conf

  • DOI
    10.1109/IPDPS.2011.240
  • Filename
    6008941