High performance discrete Fourier transforms on graphics processors

Author

Govindaraju, Naga K. ; Lloyd, Brandon ; Dotsenko, Yuri ; Smith, Burton ; Manferdelli, John

fYear

2008

fDate

15-21 Nov. 2008

Firstpage

1

Lastpage

12

Abstract

We present novel algorithms for computing discrete Fourier transforms with high performance on GPUs. We present hierarchical, mixed radix FFT algorithms for both power-of-two and non-power-of-two sizes. Our hierarchical FFT algorithms efficiently exploit shared memory on GPUs using a Stockham formulation. We reduce the memory transpose overheads in hierarchical algorithms by combining the transposes into a block-based multi-FFT algorithm. For non-power-of-two sizes, we use a combination of mixed radix FFTs of small primes and Bluestein´s algorithm. We use modular arithmetic in Bluestein´s algorithm to improve the accuracy. We implemented our algorithms using the NVIDIA CUDA API and compared their performance with NVIDIA´s CUFFT library and an optimized CPU-implementation (Intel´s MKL) on a high-end quad-core CPU. On an NVIDIA GPU, we obtained performance of up to 300 GFlops, with typical performance improvements of 2-4times over CUFFT and 8-40times improvement over MKL for large sizes.

Keywords

application program interfaces; computer graphic equipment; discrete Fourier transforms; mathematics computing; parallel algorithms; shared memory systems; Bluestein algorithm; CUFFT; Intel MKL; NVIDIA CUDA API; NVIDIA CUFFT library; NVIDIA GPU; Stockham formulation; discrete Fourier transform; graphics processor; hierarchical mixed-radix block-based multiFFT algorithm; high-end quadcore CPU; high-performance computing; memory transpose overhead; modular arithmetic; optimized CPU-implementation; shared memory system; small prime number; Arithmetic; Books; Central Processing Unit; Discrete Fourier transforms; Flexible printed circuits; Graphics; Hardware; High performance computing; Libraries; Signal processing algorithms;

fLanguage

English

Publisher

ieee

Conference_Titel

High Performance Computing, Networking, Storage and Analysis, 2008. SC 2008. International Conference for

Conference_Location

Austin, TX

Print_ISBN

978-1-4244-2834-2

Electronic_ISBN

978-1-4244-2835-9

Type

conf

DOI

10.1109/SC.2008.5213922

Filename

5213922