DocumentCode :
2053861
Title :
Multicore/GPGPU Portable Computational Kernels via Multidimensional Arrays
Author :
Edwards, H. Carter ; Sunderland, Daniel ; Amsler, Chris ; Mish, Sam
Author_Institution :
Comput. Res. Center, Sandia Nat. Labs., Albuquerque, NM, USA
fYear :
2011
fDate :
26-30 Sept. 2011
Firstpage :
363
Lastpage :
370
Abstract :
Large, complex scientific and engineering application code have a significant investment in computational kernels to implement their mathematical models. Porting these computational kernels to the collection of modern many core accelerator devices is a major challenge in that these devices have diverse programming models, application programming interfaces (APIs), and performance requirements. The Trilinos-Kokkos array programming model provides library based approach to implement computational kernels that are performance-portable to CPU-multicore and GPGPU accelerator devices. This programming model is based upon three fundamental concepts: (1) there exists one or more many core compute devices each with its own memory space, (2) data parallel kernels are executed via parallel for and parallel reduce operations, and (3) kernels operate on multidimensional arrays. Kernel execution performance is, especially for NVIDIA R GPGPU devices, extremely dependent on data access patterns. An optimal data access pattern can be different for different many core devices -- potentially leading to different implementations of computational kernels specialized for different devices. The Trilinos-Kokkos programming model support performance-portable kernels by separating data access patterns from computational kernels through a multidimensional array API. Through this API device-specific mappings of multiindices to device memory are introduced into a computational kernel through compile-time polymorphism, i.e., without modification of the kernel.
Keywords :
computer graphic equipment; coprocessors; multiprocessing systems; API; CPU-multicore accelerator devices; GPGPU accelerator devices; NVIDIA R GPGPU devices; Trilinos-Kokkos array programming model; application programming interfaces; compile-time polymorphism; mathematical models; multicore-GPGPU portable computational kernels; multidimensional arrays; Arrays; Computational modeling; Instruction sets; Kernel; Performance evaluation; Programming; Semantics; GPGPU; Parallel programming; manycore; multicore;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cluster Computing (CLUSTER), 2011 IEEE International Conference on
Conference_Location :
Austin, TX
Print_ISBN :
978-1-4577-1355-2
Electronic_ISBN :
978-0-7695-4516-5
Type :
conf
DOI :
10.1109/CLUSTER.2011.47
Filename :
6061195
Link To Document :
بازگشت