Title :
MVAPICH2 over OpenStack with SR-IOV: An Efficient Approach to Build HPC Clouds
Author :
Jie Zhang ; Xiaoyi Lu ; Arnold, Mark ; Panda, Dhabaleswar K.
Author_Institution :
Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
Abstract :
Cloud Computing with Virtualization offers attractive flexibility and elasticity to deliver resources by providing a platform for consolidating complex IT resources in a scalable manner. However, efficiently running HPC applications on Cloud Computing systems is still full of challenges. One of the biggest hurdles in building efficient HPC clouds is the unsatisfactory performance offered by underlying virtualized environments, more specifically, virtualized I/O devices. Recently, Single Root I/O Virtualization (SR-IOV) technology has been steadily gaining momentum for high-performance interconnects such as InfiniBand and 10GigE. Due to its near native performance for inter-node communication, many cloud systems such as Amazon EC2 have been using SR-IOV in their production environments. Nevertheless, recent studies have shown that the SR-IOV scheme lacks locality aware communication support, which leads to performance overheads for inter-VM communication within the same physical node. In this paper, we propose an efficient approach to build HPC clouds based on MVAPICH2 over Open Stack with SR-IOV. We first propose an extension for Open Stack Nova system to enable the IV Shmem channel in deployed virtual machines. We further present and discuss our high-performance design of virtual machine aware MVAPICH2 library over Open Stack-based HPC Clouds. Our design can fully take advantage of high-performance SR-IOV communication for inter-node communication as well as Inter-VM Shmem (IVShmem) for intra-node communication. A comprehensive performance evaluation with micro-benchmarks and HPC applications has been conducted on an experimental Open Stack-based HPC cloud and Amazon EC2. The evaluation results on the experimental HPC cloud show that our design and extension can deliver near bare-metal performance for implementing SR-IOV-based HPC clouds with virtualization. Further, compared with the performance on EC2, our experimental HPC cloud can exhibit up to 160X, 65X, 12X - mprovement potential in terms of point-to-point, collective and application for future HPC clouds.
Keywords :
cloud computing; parallel processing; virtual machines; virtualisation; Amazon EC2 cloud systems; HPC cloud computing system; IV Shmem channel; IVShmem; OpenStack Nova system; complex IT resources; high-performance SR-IOV communication; high-performance design; high-performance interconnects; inter-VM Shmem; inter-VM communication; inter-node communication; intra-node communication; locality aware communication support; production environments; single root I/O virtualization technology; virtual machine aware MVAPICH2 library; Buildings; Cloud computing; Libraries; Performance evaluation; Space heating; Virtual machining; Virtualization; Cloud Computing; IVShmem; InfiniBand; OpenStack; SR-IOV; Virtualization;
Conference_Titel :
Cluster, Cloud and Grid Computing (CCGrid), 2015 15th IEEE/ACM International Symposium on
Conference_Location :
Shenzhen
DOI :
10.1109/CCGrid.2015.166