DocumentCode
3647399
Title
Memory Bandwidth Efficient Two-Dimensional Fast Fourier Transform Algorithm and Implementation for Large Problem Sizes
Author
Berkin Akin;Peter A. Milder;Franz Franchetti;James C. Hoe
Author_Institution
Electr. &
fYear
2012
fDate
4/1/2012 12:00:00 AM
Firstpage
188
Lastpage
191
Abstract
Prevailing VLSI trends point to a growing gap between the scaling of on-chip processing throughput and off-chip memory bandwidth. An efficient use of memory bandwidth must become a first-class design consideration in order to fully utilize the processing capability of highly concurrent processing platforms like FPGAs. In this paper, we present key aspects of this challenge in developing FPGA-based implementations of two-dimensional fast Fourier transform (2D-FFT) where the large datasets must reside off-chip in DRAM. Our scalable implementations address the memory bandwidth bottleneck through both (1) algorithm design to enable efficient DRAM access patterns and (2) data path design to extract the maximum compute throughput for a given level of memory bandwidth. We present results for double-precision 2D-FFT up to size 2,048-by-2,048. On an Alter a DE4 platform our implementation of the 2,048-by-2,048 2D-FFT can achieve over 19.2 Gflop/s from the 12 GByte/s maximum DRAM bandwidth available. The results also show that our FPGA-based implementations of 2D-FFT are more efficient than 2D-FFT running on state-of-the-art CPUs and GPUs in terms of the bandwidth and power efficiency.
Keywords
"Bandwidth","Random access memory","Tiles","Field programmable gate arrays","Algorithm design and analysis","System-on-a-chip","Vectors"
Publisher
ieee
Conference_Titel
Field-Programmable Custom Computing Machines (FCCM), 2012 IEEE 20th Annual International Symposium on
Print_ISBN
978-1-4673-1605-7
Type
conf
DOI
10.1109/FCCM.2012.40
Filename
6239813
Link To Document