Title :
Low-Cost Parallel Algorithms for 2:1 Octree Balance
Author :
Isaac, Tobin ; Burstedde, Carsten ; Ghattas, Omar
Author_Institution :
Inst. for Comput. Eng. & Sci. (ICES), Univ. of Texas at Austin, Austin, TX, USA
Abstract :
The logical structure of a forest of octrees can be used to create scalable algorithms for parallel adaptive mesh refinement (AMR), which has recently been demonstrated for several petascale applications. Among various frequently used octree-based mesh operations, including refinement, coarsening, partitioning, and enumerating nodes, ensuring a 2:1 size balance between neighboring elements has historically been the most expensive in terms of CPU time and communication volume. The 2:1 balance operation is thus a primary target to optimize. One important component of a parallel balance algorithm is the ability to determine whether any two given octants have a consistent distance/size relation. Based on new logical concepts we propose fast algorithms for making this decision for all types of 2:1 balance conditions in 2D and 3D. Since we are able to achieve this without constructing any parent nodes in the tree that would otherwise need to be sorted and communicated, we can significantly reduce the required memory and communication volume. In addition, we propose a lightweight collective algorithm for reversing the asymmetric communication pattern induced by non-local octant interactions. We have implemented our improvements as part of the opensource “p4est” software. Benchmarking this code with both synthetic and simulation-driven adapted meshes we are able to demonstrate much reduced runtime and excellent weak and strong scalability. On our largest benchmark problem with 5.13 × 1011 octants the new 2:1 balance algorithm executes in less than 8 seconds on 112,128 CPU cores of the Jaguar Cray XT5 supercomputer.
Keywords :
octrees; parallel algorithms; public domain software; AMR; CPU time; Jaguar Cray XT5 supercomputer; asymmetric communication pattern; coarsening nodes; enumerating nodes; lightweight collective algorithm; low-cost parallel algorithms; octree balance; octree-based mesh operations; opensource p4est software; parallel adaptive mesh refinement; parallel balance algorithm; partitioning nodes; petascale applications; refinement nodes; Arrays; Educational institutions; Insulation; Octrees; Partitioning algorithms; USA Councils; Vegetation; Adaptive mesh refinement; High performance computing; Octrees; Parallel algorithms; Scientific computing;
Conference_Titel :
Parallel & Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International
Conference_Location :
Shanghai
Print_ISBN :
978-1-4673-0975-2
DOI :
10.1109/IPDPS.2012.47