DocumentCode
3588750
Title
Effective multi-GPU communication using multiple CUDA streams and threads
Author
Sourouri, Mohammed ; Gillberg, Tor ; Baden, Scott B. ; Xing Cai
Author_Institution
Simula Res. Lab., Lysaker, Norway
fYear
2014
Firstpage
981
Lastpage
986
Abstract
In the context of multiple GPUs that share the same PCIe bus, we propose a new communication scheme that leads to a more effective overlap of communication and computation. Multiple CUDA streams and OpenMP threads are adopted so that data can simultaneously be sent and received. A representative 3D stencil example is used to demonstrate the effectiveness of our scheme. We compare the performance of our new scheme with an MPI-based state-of-the-art scheme. Results show that our approach outperforms the state-of-the-art scheme, being up to 1.85× faster. However, our performance results also indicate that the current underlying PCIe bus architecture needs improvements to handle the future scenario of many GPUs per node.
Keywords
graphics processing units; message passing; multi-threading; parallel architectures; CUDA streams; MPI; OpenMP threads; PCIe bus architecture; communication scheme; multiGPU communication; representative 3D stencil; Context; Data transfer; Graphics processing units; Instruction sets; Kernel; Synchronization; Three-dimensional displays; CUDA; GPU; MPI; OpenMP; multi-GPU; overlap communication with computation;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel and Distributed Systems (ICPADS), 2014 20th IEEE International Conference on
Type
conf
DOI
10.1109/PADSW.2014.7097919
Filename
7097919
Link To Document