DocumentCode
560138
Title
Scalable fast multipole methods on distributed heterogeneous architectures
Author
Hu, Qi ; Gumerov, Nail A. ; Duraiswami, Ramani
Author_Institution
Dept. of Comput. Sci., Univ. of Maryland, College Park, MD, USA
fYear
2011
fDate
12-18 Nov. 2011
Firstpage
1
Lastpage
12
Abstract
We fundamentally reconsider implementation of the Fast Multipole Method (FMM) on a computing node with a heterogeneous CPU-GPU architecture with multicore CPU(s) and one or more GPU accelerators, as well as on an interconnected cluster of such nodes. The FMM is a divide- and-conquer algorithm that performs a fast N-body sum using a spatial decomposition and is often used in a time- stepping or iterative loop. Using the observation that the local summation and the analysis-based translation parts of the FMM are independent, we map these respectively to the GPUs and CPUs. Careful analysis of the FMM is performed to distribute work optimally between the multicore CPUs and the GPU accelerators. We first develop a single node version where the CPU part is parallelized using OpenMP and the GPU version via CUDA. New parallel algorithms for creating FMM data structures are presented together with load balancing strategies for the single node and distributed multiple-node versions. Our implementation can perform the N-body sum for 128M particles on 16 nodes in 4.23 seconds, a performance not achieved by others in the literature on such clusters.
Keywords
data structures; divide and conquer methods; graphics processing units; iterative methods; multiprocessing systems; parallel architectures; CPU-GPU architecture; CUDA; FMM data structures; GPU accelerators; OpenMP; analysis based translation parts; distributed heterogeneous architectures; divide-and-conquer algorithm; iterative loop; multicore CPU; scalable fast multipole methods; time stepping loop; Arrays; Clustering algorithms; Graphics processing unit; Kernel; Receivers; Sorting;
fLanguage
English
Publisher
ieee
Conference_Titel
High Performance Computing, Networking, Storage and Analysis (SC), 2011 International Conference for
Conference_Location
Seatle, WA
Electronic_ISBN
978-1-4503-0771-0
Type
conf
Filename
6114400
Link To Document