Title :
Optimizing parallel multiplication operation for rectangular and transposed matrices
Author :
Krishnan, Manojkumar ; Nieplocha, Jarek
Author_Institution :
Dept. of Comput. Sci. & Math., Pacific Northwest Nat. Lab., Richland, WA, USA
Abstract :
In many applications, matrix multiplication involves different shapes of matrices. The shape of the matrix can significantly impact the performance of matrix multiplication algorithm. This paper describes extensions of the SRUMMA parallel matrix multiplication algorithm (Krishnan and Nieplocha, 2004) to improve performance of transpose and rectangular matrices. Our approach relies on a set of hybrid algorithms which are chosen based on the shape of matrices and transpose operator involved. The algorithm exploits performance characteristics of clusters and shared memory systems: it differs from the other parallel matrix multiplication algorithms by the explicit use of shared memory and remote memory access (RMA) communication rather than message passing. The experimental results on clusters and shared memory systems demonstrate consistent performance advantages over pdgemm from the ScaLAPACK parallel linear algebra package.
Keywords :
matrix multiplication; optimisation; parallel processing; shared memory systems; workstation clusters; SRUMMA parallel matrix multiplication; ScaLAPACK parallel linear algebra package; cluster systems; hybrid algorithms; pdgemm; rectangular matrices; remote memory access communication; shared memory access communication; shared memory systems; transposed matrices; Access protocols; Aggregates; Algorithm design and analysis; Clustering algorithms; Concurrent computing; Costs; Distributed computing; Laboratories; Mathematics; Scalability;
Conference_Titel :
Parallel and Distributed Systems, 2004. ICPADS 2004. Proceedings. Tenth International Conference on
Print_ISBN :
0-7695-2152-5
DOI :
10.1109/ICPADS.2004.1316103