• DocumentCode
    2053861
  • Title

    Multicore/GPGPU Portable Computational Kernels via Multidimensional Arrays

  • Author

    Edwards, H. Carter ; Sunderland, Daniel ; Amsler, Chris ; Mish, Sam

  • Author_Institution
    Comput. Res. Center, Sandia Nat. Labs., Albuquerque, NM, USA
  • fYear
    2011
  • fDate
    26-30 Sept. 2011
  • Firstpage
    363
  • Lastpage
    370
  • Abstract
    Large, complex scientific and engineering application code have a significant investment in computational kernels to implement their mathematical models. Porting these computational kernels to the collection of modern many core accelerator devices is a major challenge in that these devices have diverse programming models, application programming interfaces (APIs), and performance requirements. The Trilinos-Kokkos array programming model provides library based approach to implement computational kernels that are performance-portable to CPU-multicore and GPGPU accelerator devices. This programming model is based upon three fundamental concepts: (1) there exists one or more many core compute devices each with its own memory space, (2) data parallel kernels are executed via parallel for and parallel reduce operations, and (3) kernels operate on multidimensional arrays. Kernel execution performance is, especially for NVIDIA R GPGPU devices, extremely dependent on data access patterns. An optimal data access pattern can be different for different many core devices -- potentially leading to different implementations of computational kernels specialized for different devices. The Trilinos-Kokkos programming model support performance-portable kernels by separating data access patterns from computational kernels through a multidimensional array API. Through this API device-specific mappings of multiindices to device memory are introduced into a computational kernel through compile-time polymorphism, i.e., without modification of the kernel.
  • Keywords
    computer graphic equipment; coprocessors; multiprocessing systems; API; CPU-multicore accelerator devices; GPGPU accelerator devices; NVIDIA R GPGPU devices; Trilinos-Kokkos array programming model; application programming interfaces; compile-time polymorphism; mathematical models; multicore-GPGPU portable computational kernels; multidimensional arrays; Arrays; Computational modeling; Instruction sets; Kernel; Performance evaluation; Programming; Semantics; GPGPU; Parallel programming; manycore; multicore;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster Computing (CLUSTER), 2011 IEEE International Conference on
  • Conference_Location
    Austin, TX
  • Print_ISBN
    978-1-4577-1355-2
  • Electronic_ISBN
    978-0-7695-4516-5
  • Type

    conf

  • DOI
    10.1109/CLUSTER.2011.47
  • Filename
    6061195