DocumentCode
3075503
Title
Streaming FFT Asynchronously on Graphics Processor Units
Author
Zhao Lili ; Shengbing, Zhang ; Meng, Zhang ; Yi, Zhang
Author_Institution
Eng. Res. Center of Embedded Syst. Integration, Northwestern Polytech. Univ. (NWPU), Xi´´an, China
Volume
1
fYear
2010
fDate
16-18 July 2010
Firstpage
308
Lastpage
312
Abstract
The Fast Fourier Transform (FFT), which charactered in memory-access-intensive, follows a divide-and-conquer strategy, is one of the most important and heavily used kernel in scientific computing. The newest generation of Graphics Processor Units (GPUs) implement a stream architecture besides acting as powerful massively parallel coprocessor. Fouthermore, the intruduction of APIs for general-purpose computation on GPUs mades GPUs an attractive choice for high-performance numerical and scientific computing. In this work we deal with the implementation of the FFT on a novel NVIDIA GPU, using the CUDA programming model. By optimizing the organiztion of signal data, exploiting the memory hierairchy, and associating the stream to different operations, we efficiently overlap kernel execution and data transfer. Our results indicate a significant performance improvement over GPU-based and CPU-based FFT algorithms. The speedup is 18 percent higher than the original GPU-based on average.
Keywords
application program interfaces; computer graphic equipment; coprocessors; fast Fourier transforms; general purpose computers; parallel programming; API; CUDA programming model; FFT; GPU; divide and conquer strategy; fast Fourier Transform; general purpose computation; graphics processor units; parallel coprocessor; stream architecture; streaming; Graphics; Graphics processing unit; Instruction sets; Kernel; Memory management; Programming; FFT; GPUs; asynchronous communication; stream;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Technology and Applications (IFITA), 2010 International Forum on
Conference_Location
Kunming
Print_ISBN
978-1-4244-7621-3
Electronic_ISBN
978-1-4244-7622-0
Type
conf
DOI
10.1109/IFITA.2010.76
Filename
5635067
Link To Document