DocumentCode :
1323317
Title :
A Methodology for Speeding Up Fast Fourier Transform Focusing on Memory Architecture Utilization
Author :
Kelefouras, Vasilios I. ; Athanasiou, George S. ; Alachiotis, Nikolaos ; Michail, Harris E. ; Kritikakou, Angeliki S. ; Goutis, Costas E.
Author_Institution :
Dept. of Electr. Eng., Patras Univ., Patras, Greece
Volume :
59
Issue :
12
fYear :
2011
Firstpage :
6217
Lastpage :
6226
Abstract :
Several SOA (state of the art) self-tuning software libraries exist, such as the Fastest Fourier Transform in the West (FFTW) for fast Fourier transform (FFT). FFT is a highly important kernel and the performance of its software implementations depends on the memory hierarchy´s utilization. FFTW minimizes register spills and data cache accesses by finding a schedule that is independent of the number of the registers and of the number of levels and size of the cache, which is a serious drawback. In this paper, a new methodology is presented, achieving improved performance by focusing on memory hierarchy utilization. The proposed methodology has three major advantages. First, the combination of production and consumption of butterflies´ results, data reuse, FFT parallelism, symmetries of twiddle factors and also additions by zeros and multiplications by zeros and ones when twiddle factors are zero or one, are fully and simultaneously exploited. Second, the optimal solution is found according to the number of the registers, the data cache sizes, the number of the levels of data cache hierarchy, the main memory page size, the associativity of the data caches and the data cache line sizes, which are also considered simultaneously and not separate. Third, compilation time and source code size are very small compared with FFTW. The proposed methodology achieves performance gain about 40% (speed-up of 1.7) for architectures with small data cache sizes where memory management has a larger effect on performance and 20% (speed-up of 1.25) on average for architectures with large data cache sizes (Pentium) in comparison with FFTW.
Keywords :
cache storage; fast Fourier transforms; FFT parallelism; data cache access; data cache associativity; data cache hierarchy; data cache line size; data reuse; fast Fourier transform; memory architecture utilization; memory hierarchy utilization; memory management; memory page size; multiplication; optimal solution; registers; self-tuning software libraries; twiddle factors; Arrays; Embedded systems; Fast Fourier transforms; Libraries; Memory architecture; Memory management; Registers; Algorithms; compilers; data locality; data reuse; embedded systems; memory management; production and consumption; register spills;
fLanguage :
English
Journal_Title :
Signal Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1053-587X
Type :
jour
DOI :
10.1109/TSP.2011.2168525
Filename :
6021384
Link To Document :
بازگشت