Title :
Performance and energy limits of a processor-integrated FFT accelerator
Author :
Tung Thanh-Hoang ; Shambayati, Amirali ; Deutschbein, Calvin ; Hoffmann, Henry ; Chien, Andrew A.
Author_Institution :
Dept. of Comput. Sci., Univ. of Chicago, Chicago, IL, USA
Abstract :
Accelerators have long been used to improve the performance and energy efficiency of embedded signal processing systems relying on Fast Fourier Transforms (FFTs). We explore the benefits of processor-integrated FFT accelerators, characterizing their performance and energy efficiency for current and future memory architectures. First, we consider designs that deeply integrate an FFT accelerator into a simple 5-stage RISC pipeline and evaluate the performance and energy efficiency for a 32 nm process. Our results indicate that a 64-point processor-integrated FFT accelerator alone can increase performance for a 4K/32k-point 1D-FFT by 7/4-fold respectively. In term of energy efficiency, our 64-point FFT accelerator increases it at least 4-fold. Second, since memory performance is a critical constraint, we evaluate system configuration with 3D-stacked DRAM systems. Our results indicate that energy efficiency bottlenecks can be alleviated, as the 3D-stacked memory reduces energy by nearly 14-fold. When combined with our FFT accelerator, overall energy efficiency for 4k and 32k-point FFTs increases 86-fold and 70-fold respectively. Prospectively, with addition of a data layout transformation engine, cycle count and energy for the data transpose phase can be reduced 10x. Such a step would increase the accelerator benefit at least 10-fold in energy for DDR3 and more than 100-fold in 3D-stacked memory system.
Keywords :
DRAM chips; embedded systems; energy conservation; fast Fourier transforms; signal processing; storage management chips; 3D stacked memory system; 3D-stacked DRAM systems; 5-stage RISC pipeline; DDR3; cycle count; data layout transformation engine; data transpose phase; embedded signal processing systems; energy efficiency; energy limits; fast Fourier transforms; memory architectures; memory performance; processor-integrated FFT accelerators; Acceleration; Bandwidth; Hardware; Pipelines; Reduced instruction set computing; Registers; Vectors;
Conference_Titel :
High Performance Extreme Computing Conference (HPEC), 2014 IEEE
Conference_Location :
Waltham, MA
Print_ISBN :
978-1-4799-6232-7
DOI :
10.1109/HPEC.2014.7040951