DocumentCode :
3459504
Title :
Implementation of Parallel 1-D FFT on GPU Clusters
Author :
Takahashi, Dr Takakazu
Author_Institution :
Fac. of Eng., Inf. & Syst., Univ. of Tsukuba, Tsukuba, Japan
fYear :
2013
fDate :
3-5 Dec. 2013
Firstpage :
174
Lastpage :
180
Abstract :
In this paper, we propose an implementation of a parallel one-dimensional fast Fourier transform (FFT) on GPU clusters. This implementation is based on the six-step FFT algorithm. Because the parallel one-dimensional FFT requires three all-to-all communications, one goal for parallel FFTs on GPU clusters is to minimize the PCI Express transfer time and the MPI communication time. We demonstrate that the advanced features of MVAPICH2-GPU make it easy to overlap PCI Express transfers and MPI communication. Performance results of one-dimensional FFTs on a GPU cluster are reported. We successfully achieved a performance of over 763 GFlops on 128 nodes of the HA-PACS (268 nodes, 2.99 TFlops/node, 802 TFlops peak performance) for 234-point FFT.
Keywords :
application program interfaces; fast Fourier transforms; graphics processing units; mathematics computing; message passing; parallel algorithms; pattern clustering; GPU clusters; HA-PACS; MPI communication time minimization; MVAPICH2-GPU; PCI Express transfer time minimization; all-to-all communications; graphics processing unit; parallel 1D FFT; parallel one-dimensional fast Fourier transform; six-step FFT algorithm; Arrays; Clustering algorithms; Equations; Graphics processing units; Indexes; Kernel; Performance evaluation; Fast Fourier transform; GPU cluster; all-to-all communication;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Science and Engineering (CSE), 2013 IEEE 16th International Conference on
Conference_Location :
Sydney, NSW
Type :
conf
DOI :
10.1109/CSE.2013.36
Filename :
6755214
Link To Document :
بازگشت