• DocumentCode
    560138
  • Title

    Scalable fast multipole methods on distributed heterogeneous architectures

  • Author

    Hu, Qi ; Gumerov, Nail A. ; Duraiswami, Ramani

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Maryland, College Park, MD, USA
  • fYear
    2011
  • fDate
    12-18 Nov. 2011
  • Firstpage
    1
  • Lastpage
    12
  • Abstract
    We fundamentally reconsider implementation of the Fast Multipole Method (FMM) on a computing node with a heterogeneous CPU-GPU architecture with multicore CPU(s) and one or more GPU accelerators, as well as on an interconnected cluster of such nodes. The FMM is a divide- and-conquer algorithm that performs a fast N-body sum using a spatial decomposition and is often used in a time- stepping or iterative loop. Using the observation that the local summation and the analysis-based translation parts of the FMM are independent, we map these respectively to the GPUs and CPUs. Careful analysis of the FMM is performed to distribute work optimally between the multicore CPUs and the GPU accelerators. We first develop a single node version where the CPU part is parallelized using OpenMP and the GPU version via CUDA. New parallel algorithms for creating FMM data structures are presented together with load balancing strategies for the single node and distributed multiple-node versions. Our implementation can perform the N-body sum for 128M particles on 16 nodes in 4.23 seconds, a performance not achieved by others in the literature on such clusters.
  • Keywords
    data structures; divide and conquer methods; graphics processing units; iterative methods; multiprocessing systems; parallel architectures; CPU-GPU architecture; CUDA; FMM data structures; GPU accelerators; OpenMP; analysis based translation parts; distributed heterogeneous architectures; divide-and-conquer algorithm; iterative loop; multicore CPU; scalable fast multipole methods; time stepping loop; Arrays; Clustering algorithms; Graphics processing unit; Kernel; Receivers; Sorting;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing, Networking, Storage and Analysis (SC), 2011 International Conference for
  • Conference_Location
    Seatle, WA
  • Electronic_ISBN
    978-1-4503-0771-0
  • Type

    conf

  • Filename
    6114400