DocumentCode :
3105383
Title :
Bandwidth intensive 3-D FFT kernel for GPUs using CUDA
Author :
Nukada, Akira ; Ogata, Yasuhiko ; Endo, Toshio ; Matsuoka, Satoshi
Author_Institution :
Tokyo Inst. of Technol., Tokyo, Japan
fYear :
2008
fDate :
15-21 Nov. 2008
Firstpage :
1
Lastpage :
11
Abstract :
Most GPU performance ldquohypesrdquo have focused around tightly-coupled applications with small memory bandwidth requirements e.g., N-body, but GPUs are also commodity vector machines sporting substantial memory bandwidth; however, effective programming methodologies thereof have been poorly studied. Our new 3-D FFT kernel, written in NVIDIA CUDA, achieves nearly 80 GFLOPS on a top-end GPU, being more than three times faster than any existing FFT implementations on GPUs including CUFFT. Careful programming techniques are employed to fully exploit modern GPU hardware characteristics while overcoming their limitations, including on-chip shared memory utilization, optimizing the number of threads and registers through appropriate localization, and avoiding low-speed stride memory accesses. Our kernel applied to real applications achieves orders of magnitude boost in power&cost vs. performance metrics. The off-card bandwidth limitation is still an issue, which could be alleviated somewhat with application kernels confinement within the card, while ideal solution being facilitation of faster GPU interfaces.
Keywords :
computer graphic equipment; fast Fourier transforms; GPU; NVIDIA CUDA; bandwidth intensive 3D FFT kernel; graphics processing units; offcard bandwidth limitation; Acceleration; Bandwidth; Computer architecture; Computer languages; Graphics; High performance computing; Kernel; Permission; Registers; Yarn;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computing, Networking, Storage and Analysis, 2008. SC 2008. International Conference for
Conference_Location :
Austin, TX
Print_ISBN :
978-1-4244-2834-2
Electronic_ISBN :
978-1-4244-2835-9
Type :
conf
DOI :
10.1109/SC.2008.5213210
Filename :
5213210
Link To Document :
بازگشت