• DocumentCode
    1886209
  • Title

    Performance of panel and block approaches to sparse Cholesky factorization on the iPSC/860 and Paragon multicomputers

  • Author

    Rothberg, Edward

  • Author_Institution
    Intel Sci. Comput., Beaverton, OR, USA
  • fYear
    1994
  • fDate
    23-25 May 1994
  • Firstpage
    324
  • Lastpage
    333
  • Abstract
    Sparse Cholesky factorization has historically achieved extremely low performance on distributed memory multiprocessors. Three issues must be addressed to improve this situation: (1) parallel factorization methods must be based on more efficient sequential methods; (2) parallel machines must provide higher interprocessor communication bandwidth; and (3) the sparse matrices used to evaluate parallel sparse factorization performance should be more representative of the sizes of matrices people would factor on large parallel machines. All of these issues have in fact already been addressed. Specifically: (1) single-node performance can be improved by moving from a column-oriented approach, where the computational kernel is Level 1 BLAS, to either a panel- or block-oriented approach, where the kernel is Level 3 BLAS; (2) communication hardware has improved dramatically, with new parallel computers providing higher communication bandwidth than previous parallel computers; and (3) several larger benchmark matrices are now available, and newer parallel machines offer sufficient memory per node to factor these larger matrices. The result of addressing these three issues is extremely high performance on moderately parallel machines. This paper demonstrates performance levels of 650 double-precision MFLOPS on 32 processors of the Intel Paragon system, 1 GFLOPS on 64 processors, and 1.7 GFLOPS on 128 processors. This paper also does a direct performance comparison between the iPSC/860 and Paragon systems, as well as a comparison between panel- and block-oriented approaches to parallel factorization
  • Keywords
    mathematics computing; matrix algebra; parallel algorithms; parallel machines; performance evaluation; 1 GFLOPS; 1.7 GFLOPS; 650 MFLOPS; Intel Paragon multicomputer; Intel iPSC/860 multicomputer; Level 1 BLAS; Level 3 BLAS; benchmark matrices; block-oriented approach; column-oriented approach; communication bandwidth; communication hardware; computational kernel; distributed memory multiprocessors; efficient sequential methods; interprocessor communication bandwidth; matrix size; memory per node; panel-oriented approach; parallel factorization; parallel factorization methods; parallel machines; single-node performance; sparse Cholesky factorization; sparse matrices; Bandwidth; Concurrent computing; Finite element methods; Hardware; High performance computing; Kernel; Linear programming; Parallel machines; Sparse matrices; Supercomputers;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Scalable High-Performance Computing Conference, 1994., Proceedings of the
  • Conference_Location
    Knoxville, TN
  • Print_ISBN
    0-8186-5680-8
  • Type

    conf

  • DOI
    10.1109/SHPCC.1994.296661
  • Filename
    296661