DocumentCode
262131
Title
Register Caching for Stencil Computations on GPUs
Author
Falch, Thomas L. ; Elster, Anne C.
fYear
2014
fDate
22-25 Sept. 2014
Firstpage
479
Lastpage
486
Abstract
For most applications, taking full advantage of the memory system is key to achieving good performance on GPUs. In this paper, we introduce register caching, a novel idea where registers of multiple threads are combined and used as a shared, last level, manually managed cache for the contributing threads. This method is enabled by the shuffle instruction recently introduced in Nvidia´s Kepler GPU architecture, which allows threads in the same warp to exchange data directly, previously only possible by going through shared memory. We evaluate our proposal with a stencil computation benchmark, achieving speedups of up to 2.04, compared to using shared memory on a GTX680 GPU. Stencil computations form the core of many scientific applications, which can therefore benefit from our proposal. Furthermore, our method is not limited to stencil computations, but is applicable to any application with a predictable memory access pattern suitable for manual caching.
Keywords
cache storage; graphics processing units; shared memory systems; GTX680 GPU; Nvidia Kepler GPU architecture; data exchange; graphics processing unit; register caching; shared memory system; shuffle instruction; stencil computation; Benchmark testing; Computer architecture; Graphics processing units; Indexes; Instruction sets; Manuals; Registers; CUDA; Caching; GPU; GPU Computing; Register Caching; Stencil Computations;
fLanguage
English
Publisher
ieee
Conference_Titel
Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), 2014 16th International Symposium on
Conference_Location
Timisoara
Print_ISBN
978-1-4799-8447-3
Type
conf
DOI
10.1109/SYNASC.2014.70
Filename
7034720
Link To Document