Title :
Processing MPI Derived Datatypes on Noncontiguous GPU-Resident Data
Author :
Jenkins, J. ; Dinan, James ; Balaji, Pavan ; Peterka, Tom ; Samatova, N.F. ; Thakur, Rahul
Author_Institution :
Dept. of Comput. Sci., North Carolina State Univ., Raleigh, NC, USA
Abstract :
Driven by the goals of efficient and generic communication of noncontiguous data layouts in GPU memory, for which solutions do not currently exist, we present a parallel, noncontiguous data-processing methodology through the MPI datatypes specification. Our processing algorithm utilizes a kernel on the GPU to pack arbitrary noncontiguous GPU data by enriching the datatypes encoding to expose a fine-grained, data-point level of parallelism. Additionally, the typically tree-based datatype encoding is preprocessed to enable efficient, cached access across GPU threads. Using CUDA, we show that the computational method outperforms DMA-based alternatives for several common data layouts as well as more complex data layouts for which reasonable DMA-based processing does not exist. Our method incurs low overhead for data layouts that closely match best-case DMA usage or that can be processed by layout-specific implementations. We additionally investigate usage scenarios for data packing that incur resource contention, identifying potential pitfalls for various packing strategies. We also demonstrate the efficacy of kernel-based packing in various communication scenarios, showing multifold improvement in point-to-point communication and evaluating packing within the context of the SHOC stencil benchmark and HACC mesh analysis.
Keywords :
application program interfaces; data handling; graphics processing units; message passing; parallel architectures; CUDA; DMA-based processing; GPU memory; GPU threads; HACC mesh analysis; MPI derived datatypes processing; SHOC stencil benchmark; compute unified device architecture; fine-grained data-point parallelism level; graphics processing unit; kernel-based packing strategies; message passing interface; noncontiguous GPU-resident data; noncontiguous data layouts; parallel noncontiguous data-processing methodology; tree-based datatype encoding; Computer graphics; Data models; Graphics processing units; CUDA; MPI; datatype; graphics processing unit;
Journal_Title :
Parallel and Distributed Systems, IEEE Transactions on
DOI :
10.1109/TPDS.2013.234