Performance of panel and block approaches to sparse Cholesky factorization on the iPSC/860 and Paragon multicomputers

Author

Rothberg, Edward

Author_Institution

Intel Sci. Comput., Beaverton, OR, USA

fYear

1994

fDate

23-25 May 1994

Firstpage

324

Lastpage

333

Abstract

Sparse Cholesky factorization has historically achieved extremely low performance on distributed memory multiprocessors. Three issues must be addressed to improve this situation: (1) parallel factorization methods must be based on more efficient sequential methods; (2) parallel machines must provide higher interprocessor communication bandwidth; and (3) the sparse matrices used to evaluate parallel sparse factorization performance should be more representative of the sizes of matrices people would factor on large parallel machines. All of these issues have in fact already been addressed. Specifically: (1) single-node performance can be improved by moving from a column-oriented approach, where the computational kernel is Level 1 BLAS, to either a panel- or block-oriented approach, where the kernel is Level 3 BLAS; (2) communication hardware has improved dramatically, with new parallel computers providing higher communication bandwidth than previous parallel computers; and (3) several larger benchmark matrices are now available, and newer parallel machines offer sufficient memory per node to factor these larger matrices. The result of addressing these three issues is extremely high performance on moderately parallel machines. This paper demonstrates performance levels of 650 double-precision MFLOPS on 32 processors of the Intel Paragon system, 1 GFLOPS on 64 processors, and 1.7 GFLOPS on 128 processors. This paper also does a direct performance comparison between the iPSC/860 and Paragon systems, as well as a comparison between panel- and block-oriented approaches to parallel factorization

Keywords

mathematics computing; matrix algebra; parallel algorithms; parallel machines; performance evaluation; 1 GFLOPS; 1.7 GFLOPS; 650 MFLOPS; Intel Paragon multicomputer; Intel iPSC/860 multicomputer; Level 1 BLAS; Level 3 BLAS; benchmark matrices; block-oriented approach; column-oriented approach; communication bandwidth; communication hardware; computational kernel; distributed memory multiprocessors; efficient sequential methods; interprocessor communication bandwidth; matrix size; memory per node; panel-oriented approach; parallel factorization; parallel factorization methods; parallel machines; single-node performance; sparse Cholesky factorization; sparse matrices; Bandwidth; Concurrent computing; Finite element methods; Hardware; High performance computing; Kernel; Linear programming; Parallel machines; Sparse matrices; Supercomputers;

fLanguage

English

Publisher

ieee

Conference_Titel

Scalable High-Performance Computing Conference, 1994., Proceedings of the

Conference_Location

Knoxville, TN

Print_ISBN

0-8186-5680-8

Type

conf

DOI

10.1109/SHPCC.1994.296661

Filename

296661