• DocumentCode
    668116
  • Title

    A scalable and portable approach to accelerate hybrid HPL on heterogeneous CPU-GPU clusters

  • Author

    Rong Shi ; Potluri, Sreeram ; Hamidouche, Khaled ; Xiaoyi Lu ; Tomko, Karen ; Panda, Dhabaleswar K.

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
  • fYear
    2013
  • fDate
    23-27 Sept. 2013
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    Accelerating High-Performance Linkpack (HPL) on heterogeneous clusters with multi-core CPUs and GPUs has attracted a lot of attention from the High Performance Computing community. It is becoming common for large scale clusters to have GPUs on only a subset of nodes in order to limit system costs. The major challenge for HPL in this case is to efficiently take advantage of all the CPU and GPU resources available on a cluster. In this paper, we present a novel two-level workload partitioning approach for HPL that distributes workload based on the compute power of CPU/GPU nodes across the cluster. Our approach also handles multi-GPU configurations. Unlike earlier approaches for heterogeneous clusters with CPU and GPU nodes, our design takes advantage of asynchronous kernel launches and CUDA copies to overlap computation and CPU-GPU data movement. It uses techniques such as process grid reordering to reduce MPI communication/contention while ensuring load balance across nodes. Our experimental results using 32 GPU and 128 CPU nodes of Oakley, a research cluster at Ohio Supercomputer Center, shows that our proposed approach can achieve more than 80% of combined actual peak performance of CPU and GPU nodes. This provides 47% and 63% increase in the HPL performance that can be reported using only CPU nodes and only GPU nodes, respectively.
  • Keywords
    graphics processing units; multiprocessing systems; parallel architectures; CPU-GPU data movement; CUDA; MPI communication-contention; asynchronous kernel; heterogeneous CPU-GPU cluster; high performance computing; high-performance Linkpack; hybrid HPL; large scale cluster; load balance; multiGPU configuration; multicore CPU; portable approach; process grid reordering; scalable approach; two-level workload partitioning approach; Benchmark testing; Graphics processing units; Kernel; Load management; Runtime; Supercomputers; CUDA; GPU; HPL; Heterogeneity;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster Computing (CLUSTER), 2013 IEEE International Conference on
  • Conference_Location
    Indianapolis, IN
  • Type

    conf

  • DOI
    10.1109/CLUSTER.2013.6702619
  • Filename
    6702619