مرکز منطقه ای اطلاع رساني علوم و فناوري - Optimization of the parallel black-box fast multipole method on CUDA

DocumentCode :

1955106

Title :

Optimization of the parallel black-box fast multipole method on CUDA

Author :

Takahashi, Toru ; Cecka, Cris ; Darve, Eric

Author_Institution :

Mech. Sci. & Eng., Nagoya Univ., Nagoya, Japan

fYear :

2012

fDate :

13-14 May 2012

Firstpage :

Lastpage :

Abstract :

The fast multipole method (FMM) is a widely used numerical algorithm in computational science and engineering. A recent research trend is to perform the FMM on many-core processors, including Graphical Processing Units (GPUs). In this paper, we discuss methods to optimize the black-box FMM (bbFMM), which is a variant of the FMM that can accept any non-oscillatory kernel as specified by the user, on GPUs. Using CUDA-capable GPUs, we focused our analysis on the two most time-consuming phases in the bbFMM: the multipole-to-local (M2L) operation and the short-range direct kernel computation. Following a previously published paper by Takahashi et al. (2011), we incorporated the best implementation of the M2L operation for the GPU in a complete bbFMM code. We created a highly optimized CPU version of the code along with the CUDA code. Although the GPU provides a significant speed-up during the M2L phase, the speed up was more moderate in the direct short-range calculation part. It was found that the 12-core CPU is close to peak performance (using all cores) during that phase of the calculation. Extensive algorithmic and performance analysis is provided between the CPU and GPU, along with comparisons with previously published work, which suggests that the current implementation is one of the most efficient for this class of FMM.

Keywords :

graphics processing units; numerical analysis; parallel architectures; CUDA-capable GPU; M2L operation; bbFMM; black-box FMM; graphical processing units; many-core processors; multipole-to-local operation; nonoscillatory kernel; numerical algorithm; parallel black-box fast multipole method; short-range direct kernel computation; Tin; Fast multipole method (FMM); Graphical processing units (GPU); High performance computing; Kernel independent;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Innovative Parallel Computing (InPar), 2012

Conference_Location :

San Jose, CA

Print_ISBN :

978-1-4673-2632-2

Electronic_ISBN :

978-1-4673-2631-5

Type :

conf

DOI :

10.1109/InPar.2012.6339607

Filename :

6339607

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1955106