DocumentCode :
1886209
Title :
Performance of panel and block approaches to sparse Cholesky factorization on the iPSC/860 and Paragon multicomputers
Author :
Rothberg, Edward
Author_Institution :
Intel Sci. Comput., Beaverton, OR, USA
fYear :
1994
fDate :
23-25 May 1994
Firstpage :
324
Lastpage :
333
Abstract :
Sparse Cholesky factorization has historically achieved extremely low performance on distributed memory multiprocessors. Three issues must be addressed to improve this situation: (1) parallel factorization methods must be based on more efficient sequential methods; (2) parallel machines must provide higher interprocessor communication bandwidth; and (3) the sparse matrices used to evaluate parallel sparse factorization performance should be more representative of the sizes of matrices people would factor on large parallel machines. All of these issues have in fact already been addressed. Specifically: (1) single-node performance can be improved by moving from a column-oriented approach, where the computational kernel is Level 1 BLAS, to either a panel- or block-oriented approach, where the kernel is Level 3 BLAS; (2) communication hardware has improved dramatically, with new parallel computers providing higher communication bandwidth than previous parallel computers; and (3) several larger benchmark matrices are now available, and newer parallel machines offer sufficient memory per node to factor these larger matrices. The result of addressing these three issues is extremely high performance on moderately parallel machines. This paper demonstrates performance levels of 650 double-precision MFLOPS on 32 processors of the Intel Paragon system, 1 GFLOPS on 64 processors, and 1.7 GFLOPS on 128 processors. This paper also does a direct performance comparison between the iPSC/860 and Paragon systems, as well as a comparison between panel- and block-oriented approaches to parallel factorization
Keywords :
mathematics computing; matrix algebra; parallel algorithms; parallel machines; performance evaluation; 1 GFLOPS; 1.7 GFLOPS; 650 MFLOPS; Intel Paragon multicomputer; Intel iPSC/860 multicomputer; Level 1 BLAS; Level 3 BLAS; benchmark matrices; block-oriented approach; column-oriented approach; communication bandwidth; communication hardware; computational kernel; distributed memory multiprocessors; efficient sequential methods; interprocessor communication bandwidth; matrix size; memory per node; panel-oriented approach; parallel factorization; parallel factorization methods; parallel machines; single-node performance; sparse Cholesky factorization; sparse matrices; Bandwidth; Concurrent computing; Finite element methods; Hardware; High performance computing; Kernel; Linear programming; Parallel machines; Sparse matrices; Supercomputers;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Scalable High-Performance Computing Conference, 1994., Proceedings of the
Conference_Location :
Knoxville, TN
Print_ISBN :
0-8186-5680-8
Type :
conf
DOI :
10.1109/SHPCC.1994.296661
Filename :
296661
Link To Document :
بازگشت