مرکز منطقه ای اطلاع رساني علوم و فناوري - Increasing efficiency of Monte Carlo particle-fluid collision calculations on GPU

DocumentCode :

1595708

Title :

Increasing efficiency of Monte Carlo particle-fluid collision calculations on GPU

Author :

Bardel, Charles ; Verboncoeur, John

Author_Institution :

Electr. & Comput. Eng., Michigan State Univ., East Lansing, MI, USA

fYear :

2013

Firstpage :

Lastpage :

Abstract :

Summary form only given. Monte Carlo particle collision calculations can be very computationally expensive for particle-in-cell codes. In the case of background fluid collisional calculations, where each particle calculation is totally independent of other collisions, the calculations can be setup as highly parallel. Porting to GPU platforms has shown two orders of magnitude decrease compared to single processor performance. One approach is to simply apply a function to every particle which involves computing the particle energy, a square root to obtain the speed, and either interpolation of tabled cross sections or computation of a curve fit for each process for every particle [1]. Then, based on this probability of collision the collisional dynamics code might be executed. For collisional probabilities, 1 this is inefficient for finding particles to collide and load imbalanced for the collisional dynamics on the vector architecture (Single-Instruction Multiple-Data SIMD) like capabilities available on the GPU[2]. The alternative approach is to use the null collision method[1] where particles selected for collision are selected at random using the total collision probability, which is independent of particle energy and position. However, this sparse random access of particles in the particle array, as needed for the null collision method[1], has drawbacks on the GPU due to SIMD architecture. GPU threads that are grouped in hardware are called warps. Each warp can only issue one computational or memory instruction. However, when two memory instructions are located with 128 bytes[2] of each other they can be ´coalesced´ into one instruction. Using the data structure and algorithm presented in [3] for efficient particle to grid charge accumulation on the GPU, which ensures that all particles contained within a cell are contiguous in memory, this paper examines the effect of selecting particles for colliding that are contiguous in that same list. This setup would ca- italize on the null collision method´s not needing to calculate the energy of each particle and optimize the memory bandwidth through the GPU. The key point under investigation is whether the particle sort algorithm retains enough entropy in the particle list gained from particle cell crossings in the algorithm in [3]. This will be examined by varying the size of the contiguous particle block and measuring the thermal equilibration time.

Keywords :

Monte Carlo methods; data structures; graphics processing units; interpolation; multi-threading; parallel algorithms; physics computing; plasma collision processes; plasma simulation; probability; storage management; GPU; Monte Carlo particle-fluid collision calculations; SIMD; background fluid collisional calculations; data algorithm; data structure; entropy; grid charge accumulation; interpolation; memory bandwidth; memory instruction; null collision method; particle energy; particle sort algorithm; particle-in-cell codes; single-instruction multiple data; thermal equilibration time; vector architecture; warps; Buildings; Computer architecture; Computers; Educational institutions; Graphics processing units; Microprocessors; Monte Carlo methods;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Plasma Science (ICOPS), 2013 Abstracts IEEE International Conference on

Conference_Location :

San Francisco, CA

ISSN :

0730-9244

Type :

conf

DOI :

10.1109/PLASMA.2013.6634961

Filename :

6634961

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1595708