• DocumentCode
    262131
  • Title

    Register Caching for Stencil Computations on GPUs

  • Author

    Falch, Thomas L. ; Elster, Anne C.

  • fYear
    2014
  • fDate
    22-25 Sept. 2014
  • Firstpage
    479
  • Lastpage
    486
  • Abstract
    For most applications, taking full advantage of the memory system is key to achieving good performance on GPUs. In this paper, we introduce register caching, a novel idea where registers of multiple threads are combined and used as a shared, last level, manually managed cache for the contributing threads. This method is enabled by the shuffle instruction recently introduced in Nvidia´s Kepler GPU architecture, which allows threads in the same warp to exchange data directly, previously only possible by going through shared memory. We evaluate our proposal with a stencil computation benchmark, achieving speedups of up to 2.04, compared to using shared memory on a GTX680 GPU. Stencil computations form the core of many scientific applications, which can therefore benefit from our proposal. Furthermore, our method is not limited to stencil computations, but is applicable to any application with a predictable memory access pattern suitable for manual caching.
  • Keywords
    cache storage; graphics processing units; shared memory systems; GTX680 GPU; Nvidia Kepler GPU architecture; data exchange; graphics processing unit; register caching; shared memory system; shuffle instruction; stencil computation; Benchmark testing; Computer architecture; Graphics processing units; Indexes; Instruction sets; Manuals; Registers; CUDA; Caching; GPU; GPU Computing; Register Caching; Stencil Computations;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), 2014 16th International Symposium on
  • Conference_Location
    Timisoara
  • Print_ISBN
    978-1-4799-8447-3
  • Type

    conf

  • DOI
    10.1109/SYNASC.2014.70
  • Filename
    7034720