DocumentCode
3233654
Title
Exploring Data Streaming to Improve 3D FFT Implementation on Multiple GPUs
Author
da Silva, C.P. ; Cupertino, Leandro F. ; Chevitarese, Daniel ; Pacheco, Marco Aurélio C ; Bentes, Cristiana
Author_Institution
Dept. of Electr. Eng., PUC-Rio, Rio de Janeiro, Brazil
fYear
2010
fDate
27-30 Oct. 2010
Firstpage
13
Lastpage
18
Abstract
FFT is a well known and widely used algorithm in many scientific and engineering applications. However, FFT is a memory-bound problem that still presents performance challenges to new generations of computer architectures due to its relatively low ratio of computation per memory access. For GPU architectures, where the data transfers between the host CPU memory and the device memory is very expensive, the memory overhead can become a huge bottleneck for large size problems. In this work, we propose an efficient parallel implementation of FFT on multiple GPUs that tackles the overhead of host memory access, by implementing a streaming scheme that hides the data transfer latency. The idea is to divide the problem into smaller ones, generating several lighter and asynchronous memory transfers from host to device enabling the computation for those data simultaneously. We obtained an acceleration of approximately 60% over the non streamed GPU implementation.
Keywords
coprocessors; fast Fourier transforms; memory architecture; 3D FFT implementation; asynchronous memory transfer; computer architecture; data streaming; memory access; memory bound problem; multiple GPU; parallel implementation; Computer architecture; Discrete Fourier transforms; Graphics processing unit; Instruction sets; Kernel; Synchronization; Three dimensional displays; 3D FFT; Data Streaming; Multiple GPUs; asynchronous memory transfers;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Architecture and High Performance Computing Workshops (SBAC-PADW), 2010 22nd International Symposium on
Conference_Location
Petropolis
Print_ISBN
978-1-4244-8877-3
Electronic_ISBN
978-0-7695-4276-8
Type
conf
DOI
10.1109/SBAC-PADW.2010.9
Filename
5645389
Link To Document