• DocumentCode
    611067
  • Title

    Improving HPC Application Performance in Cloud through Dynamic Load Balancing

  • Author

    Gupta, Arpan ; Sarood, Osman ; Kale, Laxmikant V. ; Milojicic, D.

  • Author_Institution
    Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA
  • fYear
    2013
  • fDate
    13-16 May 2013
  • Firstpage
    402
  • Lastpage
    409
  • Abstract
    Driven by the benefits of elasticity and pay-as-you-go model, cloud computing is emerging as an attractive alternative and addition to in-house clusters and supercomputers for some High Performance Computing (HPC) applications. However, poor interconnect performance, heterogeneous and dynamic environment, and interference by other virtual machines (VMs) are some bottlenecks for efficient HPC in cloud. For tightly-coupled iterative applications, one slow processor slows down the entire application, resulting in poor CPU utilization. In this paper, we present a dynamic load balancer for tightly-coupled iterative HPC applications in cloud. It infers the static hardware heterogeneity in virtualized environments, and also adapts to the dynamic heterogeneity caused by the interference arising due to multi-tenancy. Through continuous live monitoring, instrumentation, and periodic refinement of task distribution to VMs, our load balancer adapts to the dynamic variations in cloud resources. Through experimental evaluation on a private cloud with 64 VMs using benchmarks and a real science application, we demonstrate performance benefits up to 45%. Finally, we analyze the effect of load balancing frequency, problem size, and computational granularity (problem decomposition) on the performance and scalability of our techniques.
  • Keywords
    cloud computing; parallel processing; performance evaluation; resource allocation; virtual machines; virtualisation; CPU utilization; VM; cloud computing; cloud resources; computational granularity; dynamic heterogeneity; dynamic load balancing frequency; heterogeneous environment; high-performance computing applications; interconnect performance; multitenancy; pay-as-you-go model; periodic task distribution refinement; private cloud; problem decomposition; problem size; static hardware heterogeneity; tightly-coupled iterative HPC application performance improvement; virtual machines; Benchmark testing; Cloud computing; Clouds; Interference; Load management; Radio access networks; Runtime; Cloud; High Performance Computing; Load balancing; Placement; Runtime system; Virtual machines;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster, Cloud and Grid Computing (CCGrid), 2013 13th IEEE/ACM International Symposium on
  • Conference_Location
    Delft
  • Print_ISBN
    978-1-4673-6465-2
  • Type

    conf

  • DOI
    10.1109/CCGrid.2013.65
  • Filename
    6546119