• DocumentCode
    3109016
  • Title

    High performance discrete Fourier transforms on graphics processors

  • Author

    Govindaraju, Naga K. ; Lloyd, Brandon ; Dotsenko, Yuri ; Smith, Burton ; Manferdelli, John

  • fYear
    2008
  • fDate
    15-21 Nov. 2008
  • Firstpage
    1
  • Lastpage
    12
  • Abstract
    We present novel algorithms for computing discrete Fourier transforms with high performance on GPUs. We present hierarchical, mixed radix FFT algorithms for both power-of-two and non-power-of-two sizes. Our hierarchical FFT algorithms efficiently exploit shared memory on GPUs using a Stockham formulation. We reduce the memory transpose overheads in hierarchical algorithms by combining the transposes into a block-based multi-FFT algorithm. For non-power-of-two sizes, we use a combination of mixed radix FFTs of small primes and Bluestein´s algorithm. We use modular arithmetic in Bluestein´s algorithm to improve the accuracy. We implemented our algorithms using the NVIDIA CUDA API and compared their performance with NVIDIA´s CUFFT library and an optimized CPU-implementation (Intel´s MKL) on a high-end quad-core CPU. On an NVIDIA GPU, we obtained performance of up to 300 GFlops, with typical performance improvements of 2-4times over CUFFT and 8-40times improvement over MKL for large sizes.
  • Keywords
    application program interfaces; computer graphic equipment; discrete Fourier transforms; mathematics computing; parallel algorithms; shared memory systems; Bluestein algorithm; CUFFT; Intel MKL; NVIDIA CUDA API; NVIDIA CUFFT library; NVIDIA GPU; Stockham formulation; discrete Fourier transform; graphics processor; hierarchical mixed-radix block-based multiFFT algorithm; high-end quadcore CPU; high-performance computing; memory transpose overhead; modular arithmetic; optimized CPU-implementation; shared memory system; small prime number; Arithmetic; Books; Central Processing Unit; Discrete Fourier transforms; Flexible printed circuits; Graphics; Hardware; High performance computing; Libraries; Signal processing algorithms;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing, Networking, Storage and Analysis, 2008. SC 2008. International Conference for
  • Conference_Location
    Austin, TX
  • Print_ISBN
    978-1-4244-2834-2
  • Electronic_ISBN
    978-1-4244-2835-9
  • Type

    conf

  • DOI
    10.1109/SC.2008.5213922
  • Filename
    5213922