• DocumentCode
    2235155
  • Title

    Parallel Matrix-Matrix Multiplication Based on HPL with a GPU-Accelerated PC Cluster

  • Author

    Wang, Qin ; Ohmura, Junichi ; Axida, Shan ; Miyoshi, Takefumi ; Irie, Hidetsugu ; Yoshinaga, Tsutomu

  • Author_Institution
    Dept. of Inf. Network Syst., Univ. of Electro-Commun., Chofu, Japan
  • fYear
    2010
  • fDate
    17-19 Nov. 2010
  • Firstpage
    243
  • Lastpage
    248
  • Abstract
    In this paper, we propose an approach for significantly improving the performance of parallel matrix-matrix multiplication using a GPU-accelerated cluster. For one node, we implement a CPUs-GPU parallel double-precision general matrix-matrix multiplication (dgemm) operation and achieve a performance improvement of 32% as compared to the GPU-only case and 56% as compared to the CPUs-only case. For the entire cluster, we use the overlap GPU acceleration solution to high-performance Linpack (HPL), which eliminates the close dependency between the LU decomposition and the dgemm operation, and achieve a performance improvement of 5.72% as compared to the flat GPU acceleration case.
  • Keywords
    computer graphic equipment; coprocessors; matrix multiplication; parallel programming; CPU; GPU accelerated PC cluster; HPL; LU decomposition; dgemm operation; high-performance Linpack; parallel matrix-matrix multiplication; performance improvement; GPU; MPI; cluster; heterogeneous; matrix-multiplier; parallelization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Networking and Computing (ICNC), 2010 First International Conference on
  • Conference_Location
    Higashi-Hiroshima
  • Print_ISBN
    978-1-4244-8918-3
  • Electronic_ISBN
    978-0-7695-4277-5
  • Type

    conf

  • DOI
    10.1109/IC-NC.2010.39
  • Filename
    5695242