• DocumentCode
    1347979
  • Title

    Efficient sparse LU factorization with partial pivoting on distributed memory architectures

  • Author

    Fu, Cong ; Jiao, Xiangmin ; Yang, Tao

  • Author_Institution
    Siemens Pyramid Inf. Syst., San Jose, CA., USA
  • Volume
    9
  • Issue
    2
  • fYear
    1998
  • fDate
    2/1/1998 12:00:00 AM
  • Firstpage
    109
  • Lastpage
    125
  • Abstract
    A sparse LU factorization based on Gaussian elimination with partial pivoting (GEPP) is important to many scientific applications, but it is still an open problem to develop a high performance GEPP code on distributed memory machines. The main difficulty is that partial pivoting operations dynamically change computation and nonzero fill-in structures during the elimination process. This paper presents an approach called S* for parallelizing this problem on distributed memory machines. The S* approach adopts static symbolic factorization to avoid run-time control overhead, incorporates 2D L/U supemode partitioning and amalgamation strategies to improve caching performance, and exploits irregular task parallelism embedded in sparse LU using asynchronous computation scheduling. The paper discusses and compares the algorithms using 1D and 2D data mapping schemes, and presents experimental studies on Cray-T3D and T3E. The performance results for a set of nonsymmetric benchmark matrices are very encouraging, and S* has achieved up to 6.878 GFLOPS on 128 T3E nodes. To the best of our knowledge, this is the highest performance ever achieved for this challenging problem and the previous record was 2.583 GFLOPS on shared memory machines
  • Keywords
    distributed memory systems; matrix decomposition; parallel algorithms; sparse matrices; 6.878 GFLOPS; Cray-T3D; Gaussian elimination; S*; T3E; distributed memory architectures; high performance GEPP; parallelizing; partial pivoting; sparse LU factorization; static symbolic factorization; Concurrent computing; Data structures; Equations; Memory architecture; Numerical stability; Parallel processing; Processor scheduling; Runtime; Sparse matrices; Symmetric matrices;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/71.663864
  • Filename
    663864