Title :
Effective multi-GPU communication using multiple CUDA streams and threads
Author :
Sourouri, Mohammed ; Gillberg, Tor ; Baden, Scott B. ; Xing Cai
Author_Institution :
Simula Res. Lab., Lysaker, Norway
Abstract :
In the context of multiple GPUs that share the same PCIe bus, we propose a new communication scheme that leads to a more effective overlap of communication and computation. Multiple CUDA streams and OpenMP threads are adopted so that data can simultaneously be sent and received. A representative 3D stencil example is used to demonstrate the effectiveness of our scheme. We compare the performance of our new scheme with an MPI-based state-of-the-art scheme. Results show that our approach outperforms the state-of-the-art scheme, being up to 1.85× faster. However, our performance results also indicate that the current underlying PCIe bus architecture needs improvements to handle the future scenario of many GPUs per node.
Keywords :
graphics processing units; message passing; multi-threading; parallel architectures; CUDA streams; MPI; OpenMP threads; PCIe bus architecture; communication scheme; multiGPU communication; representative 3D stencil; Context; Data transfer; Graphics processing units; Instruction sets; Kernel; Synchronization; Three-dimensional displays; CUDA; GPU; MPI; OpenMP; multi-GPU; overlap communication with computation;
Conference_Titel :
Parallel and Distributed Systems (ICPADS), 2014 20th IEEE International Conference on
DOI :
10.1109/PADSW.2014.7097919