Title :
Exploring Data Streaming to Improve 3D FFT Implementation on Multiple GPUs
Author :
da Silva, C.P. ; Cupertino, Leandro F. ; Chevitarese, Daniel ; Pacheco, Marco Aurélio C ; Bentes, Cristiana
Author_Institution :
Dept. of Electr. Eng., PUC-Rio, Rio de Janeiro, Brazil
Abstract :
FFT is a well known and widely used algorithm in many scientific and engineering applications. However, FFT is a memory-bound problem that still presents performance challenges to new generations of computer architectures due to its relatively low ratio of computation per memory access. For GPU architectures, where the data transfers between the host CPU memory and the device memory is very expensive, the memory overhead can become a huge bottleneck for large size problems. In this work, we propose an efficient parallel implementation of FFT on multiple GPUs that tackles the overhead of host memory access, by implementing a streaming scheme that hides the data transfer latency. The idea is to divide the problem into smaller ones, generating several lighter and asynchronous memory transfers from host to device enabling the computation for those data simultaneously. We obtained an acceleration of approximately 60% over the non streamed GPU implementation.
Keywords :
coprocessors; fast Fourier transforms; memory architecture; 3D FFT implementation; asynchronous memory transfer; computer architecture; data streaming; memory access; memory bound problem; multiple GPU; parallel implementation; Computer architecture; Discrete Fourier transforms; Graphics processing unit; Instruction sets; Kernel; Synchronization; Three dimensional displays; 3D FFT; Data Streaming; Multiple GPUs; asynchronous memory transfers;
Conference_Titel :
Computer Architecture and High Performance Computing Workshops (SBAC-PADW), 2010 22nd International Symposium on
Conference_Location :
Petropolis
Print_ISBN :
978-1-4244-8877-3
Electronic_ISBN :
978-0-7695-4276-8
DOI :
10.1109/SBAC-PADW.2010.9