Title :
Improving communication performance in dense linear algebra via topology aware collectives
Author :
Solomonik, Edgar ; Bhatele, Abhinav ; Demmel, James
Author_Institution :
Dept. of Electr. Eng. & Comput. Sci., Univ. of California at Berkeley, Berkeley, CA, USA
Abstract :
Recent results have shown that topology aware mapping reduces network contention in communication-intensive kernels on massively parallel machines. We demonstrate that on mesh interconnects, topology aware mapping also allows for the utilization of highly-efficient topology aware collectives. We map novel 2.5D dense linear algebra algorithms to exploit rectangular collectives on cuboid partitions allocated by a Blue Gene/P supercomputer. Our mappings allow the algorithms to exploit optimized line multicasts and reductions. Commonly used 2D algorithms cannot be mapped in this fashion. On 16,384 nodes (65,536 cores) of Blue Gene/P, 2.5D algorithms that exploit rectangular collectives are sig- nificantly faster than 2D matrix multiplication (MM) and LU factorization, up to 8.7x and 2.1x, respectively. These speed-ups are due to communication reduction (up to 95.6% for 2.5D MM with respect to 2D MM). We also derive LogP- based novel performance models for rectangular broadcasts and reductions. Using those, we model the performance of matrix multiplication and LU factorization on a hypothetical exascale architecture.
Keywords :
linear algebra; mathematics computing; matrix multiplication; parallel machines; 2.5D dense linear algebra algorithm; Blue Gene/P supercomputer; LU factorization; LogP-based novel performance model; communication performance; communication reduction; communication-intensive kernels; cuboid partitions; hypothetical exascale architecture; line multicast optimisation; massively parallel machines; matrix multiplication; mesh interconnects; network contention reduction; rectangular broadcasts; rectangular collectives; rectangular reduction; topology aware collectives; topology aware mapping; Algorithm design and analysis; Bandwidth; Linear algebra; Network topology; Partitioning algorithms; Three dimensional displays; Topology; communication; exascale; interconnect topology; mapping; performance;
Conference_Titel :
High Performance Computing, Networking, Storage and Analysis (SC), 2011 International Conference for
Electronic_ISBN :
978-1-4503-0771-0