• DocumentCode
    3588750
  • Title

    Effective multi-GPU communication using multiple CUDA streams and threads

  • Author

    Sourouri, Mohammed ; Gillberg, Tor ; Baden, Scott B. ; Xing Cai

  • Author_Institution
    Simula Res. Lab., Lysaker, Norway
  • fYear
    2014
  • Firstpage
    981
  • Lastpage
    986
  • Abstract
    In the context of multiple GPUs that share the same PCIe bus, we propose a new communication scheme that leads to a more effective overlap of communication and computation. Multiple CUDA streams and OpenMP threads are adopted so that data can simultaneously be sent and received. A representative 3D stencil example is used to demonstrate the effectiveness of our scheme. We compare the performance of our new scheme with an MPI-based state-of-the-art scheme. Results show that our approach outperforms the state-of-the-art scheme, being up to 1.85× faster. However, our performance results also indicate that the current underlying PCIe bus architecture needs improvements to handle the future scenario of many GPUs per node.
  • Keywords
    graphics processing units; message passing; multi-threading; parallel architectures; CUDA streams; MPI; OpenMP threads; PCIe bus architecture; communication scheme; multiGPU communication; representative 3D stencil; Context; Data transfer; Graphics processing units; Instruction sets; Kernel; Synchronization; Three-dimensional displays; CUDA; GPU; MPI; OpenMP; multi-GPU; overlap communication with computation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Systems (ICPADS), 2014 20th IEEE International Conference on
  • Type

    conf

  • DOI
    10.1109/PADSW.2014.7097919
  • Filename
    7097919