Title :
Revisiting communication performance models for computational clusters
Author :
Lastovetsky, Alexey ; Rychkov, Vladimir ; Flynn, Maureen O.
Author_Institution :
Sch. of Comput. Sci. & Inf., Univ. Coll. Dublin, Dublin, Ireland
Abstract :
In this paper, we analyze restrictions of traditional models affecting the accuracy of analytical prediction of the execution time of collective communication operations. In particular, we show that the constant and variable contributions of processors and network are not fully separated in these models. Full separation of the contributions that have different nature and arise from different sources will lead to more intuitive and accurate models, but the parameters of such models cannot be estimated from only the point-to-point experiments, which are usually used for traditional models. We are making the point that all the traditional models are designed so that their parameters can be estimated from a set of point-to-point communication experiments. In this paper, we demonstrate that the more intuitive models allow for much more accurate analytical prediction of the execution time of collective communication operations on both homogeneous and heterogeneous clusters. We present in detail one such a point-to-point model and how it can be used for prediction of the execution time of scatter and gather. We describe a set of communication experiments sufficient for accurate estimation of its parameters, and we conclude with presentation of experimental results demonstrating that the model much more accurately predicts the execution time of collective operations than traditional models.
Keywords :
workstation clusters; communication performance models; computational clusters; heterogeneous clusters; homogeneous clusters; Analytical models; Clustering algorithms; Communication switching; Computational modeling; Computer science; Concurrent computing; Parameter estimation; Performance analysis; Predictive models; Switches; Computational cluster; MPI; analytical prediction of execution time; communication performance model; estimation of parameters;
Conference_Titel :
Parallel & Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on
Conference_Location :
Rome
Print_ISBN :
978-1-4244-3751-1
Electronic_ISBN :
1530-2075
DOI :
10.1109/IPDPS.2009.5160918