مرکز منطقه ای اطلاع رساني علوم و فناوري - Auto-tuning 3-D FFT library for CUDA GPUs

DocumentCode :

580092

Title :

Auto-tuning 3-D FFT library for CUDA GPUs

Author :

Nukada, A. ; Matsuoka, Shingo

Author_Institution :

Tokyo Inst. of Technol., Tokyo, Japan

fYear :

2009

fDate :

14-20 Nov. 2009

Firstpage :

Lastpage :

Abstract :

Existing implementations of FFTs on GPUs are optimized for specific transform sizes like powers of two, and exhibit unstable and peaky performance i.e., do not perform as well in other sizes that appear in practice. Our new auto-tuning 3-D FFT on CUDA generates high performance CUDA kernels for FFTs of varying transform sizes, alleviating this problem. Although auto-tuning has been implemented on GPUs for dense kernels such as DGEMM and stencils, this is the first instance that has been applied comprehensively to bandwidth intensive and complex kernels such as 3-D FFTs. Bandwidth intensive optimizations such as selecting the number of threads and inserting padding to avoid bank conflicts on shared memory are systematically applied. Our resulting autotuner is fast and results in performance that essentially beats all 3-D FFT implementations on a single processor to date, and moreover exhibits stable performance irrespective of problem sizes or the underlying GPU hardware.

Keywords :

fast Fourier transforms; graphics processing units; matrix multiplication; multi-threading; optimisation; parallel architectures; shared memory systems; CUDA GPU; DGEMM; GPU hardware; auto-tuning 3D FFT library; bandwidth intensive kernels; bandwidth intensive optimizations; complex kernels; dense kernels; performance CUDA kernels; stencils; transform sizes;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

High Performance Computing Networking, Storage and Analysis, Proceedings of the Conference on

Conference_Location :

Portland, OR

Type :

conf

DOI :

10.1145/1654059.1654090

Filename :

6375540

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=580092