Multicore/GPGPU Portable Computational Kernels via Multidimensional Arrays

Author

Edwards, H. Carter ; Sunderland, Daniel ; Amsler, Chris ; Mish, Sam

Author_Institution

Comput. Res. Center, Sandia Nat. Labs., Albuquerque, NM, USA

fYear

2011

fDate

26-30 Sept. 2011

Firstpage

363

Lastpage

370

Abstract

Large, complex scientific and engineering application code have a significant investment in computational kernels to implement their mathematical models. Porting these computational kernels to the collection of modern many core accelerator devices is a major challenge in that these devices have diverse programming models, application programming interfaces (APIs), and performance requirements. The Trilinos-Kokkos array programming model provides library based approach to implement computational kernels that are performance-portable to CPU-multicore and GPGPU accelerator devices. This programming model is based upon three fundamental concepts: (1) there exists one or more many core compute devices each with its own memory space, (2) data parallel kernels are executed via parallel for and parallel reduce operations, and (3) kernels operate on multidimensional arrays. Kernel execution performance is, especially for NVIDIA R GPGPU devices, extremely dependent on data access patterns. An optimal data access pattern can be different for different many core devices -- potentially leading to different implementations of computational kernels specialized for different devices. The Trilinos-Kokkos programming model support performance-portable kernels by separating data access patterns from computational kernels through a multidimensional array API. Through this API device-specific mappings of multiindices to device memory are introduced into a computational kernel through compile-time polymorphism, i.e., without modification of the kernel.

Keywords

computer graphic equipment; coprocessors; multiprocessing systems; API; CPU-multicore accelerator devices; GPGPU accelerator devices; NVIDIA R GPGPU devices; Trilinos-Kokkos array programming model; application programming interfaces; compile-time polymorphism; mathematical models; multicore-GPGPU portable computational kernels; multidimensional arrays; Arrays; Computational modeling; Instruction sets; Kernel; Performance evaluation; Programming; Semantics; GPGPU; Parallel programming; manycore; multicore;

fLanguage

English

Publisher

ieee

Conference_Titel

Cluster Computing (CLUSTER), 2011 IEEE International Conference on

Conference_Location

Austin, TX

Print_ISBN

978-1-4577-1355-2

Electronic_ISBN

978-0-7695-4516-5

Type

conf

DOI

10.1109/CLUSTER.2011.47

Filename

6061195