Title :
Searching for the Optimal Data Partitioning Shape for Parallel Matrix Matrix Multiplication on 3 Heterogeneous Processors
Author :
DeFlumere, Ashley ; Lastovetsky, Alexey
Author_Institution :
Sch. of Comput. Sci. & Inf., Univ. Coll. Dublin, Dublin, Ireland
Abstract :
Parallel Matrix-Matrix Multiplication (MMM) is a fundamental part of the linear algebra libraries used by scientific applications on high performance computers. As heterogeneous systems have emerged as high performance computing platforms, the traditional homogeneous algorithms have been adapted to these heterogeneous environments. Although heterogeneous systems have been in use for some time, it remains an open problem of how to optimally partition data on heterogeneous processors to minimize computation, communication, and execution time. While the question of how to subdivide these MMM problems among heterogeneous processors has been studied, the underlying assumption of this prior study is that the data partition shape, the layout of the data within the matrix assigned to each processor, should be rectangular, i.e. that each processor should be assigned a rectangular portion of the matrix to compute. Our previous work in this area questioned the optimality of this traditional rectangular shape and studied this partition shape problem for two processors. In that work, we proposed a novel mathematical method for transforming partition shapes to decrease communication cost and an analytical technique for determining the optimal shape. In this work, we extend this technique to apply to three and more heterogeneous processors. While applying this method to two processors is relatively straightforward, the complexity grows immensely when considering three processors. With this complexity in mind, we propose a hybrid of experimental and analytical techniques. We postulate that a small number of partition shapes are potentially optimal, and perform extensive testing using a computer aided method to apply our previously developed analytical technique, without finding a counterexample. We identified six data partition shapes which are candidates to be the optimal three processor shape.
Keywords :
linear algebra; matrix multiplication; parallel processing; MMM; analytical technique; computer aided method; data partition shapes; heterogeneous environments; heterogeneous processor; heterogeneous systems; high performance computer; high performance computing platform; homogeneous algorithms; linear algebra library; optimal data partitioning shape; optimal three processor shape; optimally partition data; parallel matrix matrix multiplication; partition shape problem; rectangular shape; Computational modeling; Finite element analysis; Partitioning algorithms; Program processors; Shape; Software algorithms; Transmission line matrix methods; Heterogeneous Computing; High Performance Computing; Matrix Partitioning; Parallel Matrix Multiplication;
Conference_Titel :
Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International
Conference_Location :
Phoenix, AZ
Print_ISBN :
978-1-4799-4117-9
DOI :
10.1109/IPDPSW.2014.8