Title of article :
MPI-based implementation of a PCG solver using an EBE architecture and preconditioner for implicit, 3-D finite element analysis
Author/Authors :
Arne S. Gullerud، نويسنده , , Robert H. Dodds Jr.، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2001
Pages :
23
From page :
553
To page :
575
Abstract :
This work describes a coarse-grain parallel implementation of a linear preconditioned conjugate gradient solver using an element-by-element architecture and preconditioner for computation. The solver, implemented within a nonlinear, implicit finite element code, uses an MPI-based message-passing approach to provide portable parallel execution on shared, distributed, and distributed-shared memory computers. The flexibility of the element-by-element approach permits a dual-level mesh decomposition; a coarse, domain-level decomposition creates a load-balanced domain for each processor for parallel computation, while a second level decomposition breaks each domain into blocks of similar elements (same constitutive model, order of integration, element type) for fine-grained parallel computation on each processor. The key contribution here is a new parallel implementation of the Hughes–Winget (HW) element-by-element preconditioner suitable for arbitrary, unstructured meshes. The implementation couples an unstructured dependency graph with a new balanced graph-coloring algorithm to schedule parallel computations within and across domains. The code also includes the diagonal preconditioner and a modern parallel (threaded) sparse direct solver for comparison. Three example problems with up to 158,000 elements and 180,000 nodes analyzed on an SGI/Cray Origin 2000 illustrate the parallel performance of the algorithms and preconditioners. Analyses with varying block sizes illustrate that the two-level decomposition improves overall execution speed with the block size tuned for the cache memory architecture of the executing platform. This implementation of the HW preconditioner shows reasonable parallel efficiency – typically 80% on 48 processors. Efficiency for the diagonal preconditioner is also high, with total speedups reaching 86% on 48 CPUs. Calculation of the tangent element stiffnesses shows superlinear speedups for each of the test problems, while the computation of strains/stresses/residual forces shows 80% parallel efficiency on 48 processors.
Keywords :
Element-by-element computation , Hughes–Winget preconditioner , Parallel finite elements , message passing , Conjugate Gradient , domain decomposition , Coloring algorithms
Journal title :
Computers and Structures
Serial Year :
2001
Journal title :
Computers and Structures
Record number :
1208640
Link To Document :
بازگشت