Title of article

MPI-based implementation of a PCG solver using an EBE architecture and preconditioner for implicit, 3-D finite element analysis

Author/Authors

Arne S. Gullerud، نويسنده , , Robert H. Dodds Jr.، نويسنده ,

Issue Information

روزنامه با شماره پیاپی سال 2001

Pages

From page

553

To page

575

Abstract

This work describes a coarse-grain parallel implementation of a linear preconditioned conjugate gradient solver using an element-by-element architecture and preconditioner for computation. The solver, implemented within a nonlinear, implicit finite element code, uses an MPI-based message-passing approach to provide portable parallel execution on shared, distributed, and distributed-shared memory computers. The flexibility of the element-by-element approach permits a dual-level mesh decomposition; a coarse, domain-level decomposition creates a load-balanced domain for each processor for parallel computation, while a second level decomposition breaks each domain into blocks of similar elements (same constitutive model, order of integration, element type) for fine-grained parallel computation on each processor. The key contribution here is a new parallel implementation of the Hughes–Winget (HW) element-by-element preconditioner suitable for arbitrary, unstructured meshes. The implementation couples an unstructured dependency graph with a new balanced graph-coloring algorithm to schedule parallel computations within and across domains. The code also includes the diagonal preconditioner and a modern parallel (threaded) sparse direct solver for comparison. Three example problems with up to 158,000 elements and 180,000 nodes analyzed on an SGI/Cray Origin 2000 illustrate the parallel performance of the algorithms and preconditioners. Analyses with varying block sizes illustrate that the two-level decomposition improves overall execution speed with the block size tuned for the cache memory architecture of the executing platform. This implementation of the HW preconditioner shows reasonable parallel efficiency – typically 80% on 48 processors. Efficiency for the diagonal preconditioner is also high, with total speedups reaching 86% on 48 CPUs. Calculation of the tangent element stiffnesses shows superlinear speedups for each of the test problems, while the computation of strains/stresses/residual forces shows 80% parallel efficiency on 48 processors.

Keywords

Element-by-element computation , Hughes–Winget preconditioner , Parallel finite elements , message passing , Conjugate Gradient , domain decomposition , Coloring algorithms

Journal title

Computers and Structures

Serial Year

2001

Journal title

Computers and Structures

Record number

1208640

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=10&DC=1208640