DocumentCode :
1826540
Title :
Runtime Optimization of Broadcast Communications Using Dynamic Network Topology Information from MPI
Author :
Godwin, Jeffrey ; Karlsson, Christer ; Chen, Zizhong
Author_Institution :
Colorado Sch. of Mines, Golden, CO, USA
fYear :
2012
fDate :
25-27 June 2012
Firstpage :
287
Lastpage :
294
Abstract :
Modern commodity compute clusters are often composed of many multi-core nodes, that are connected via a network to each other. On multi-core clusters, inter-node network communications are typically an order of magnitude slower than those between processes on the same node, which effectively creates a heterogeneous, tiered network topology. Presently, most MPI implementations assume a homogeneous network composition, which causes them to have less than optimal performance on multi-core clusters. In this paper, we treat a multi-core cluster as a heterogeneous cluster and optimize the performance of MPI broadcast communications by scheduling messages according to topology information. We experimentally demonstrate that previous heuristics for heterogeneous clusters such as Fastest Edge First (FEF) do not produce optimal results on multi-core clusters for broadcast communications. Our solution is to modify the Fastest Edge First heuristic by imposing an additional constraint, that permits only one core per node to participate in inter-node communications, creating a nested binomial tree structure. Using this constraint we are able to achieve performance gains of 20%-60% over the MPI broadcast implementation on homogeneous, multi-core clusters.
Keywords :
broadcast communication; computer communications software; message passing; multiprocessing systems; network topology; optimisation; processor scheduling; tree data structures; FEF heuristic; MPI broadcast communications; dynamic network topology information; fastest edge first heuristic; heterogeneous cluster; heterogeneous network topology; homogeneous network composition; inter-node network communications; message scheduling; modern commodity compute clusters; multicore clusters; multicore nodes; nested binomial tree structure; performance optimization; runtime optimization; tiered network topology; Benchmark testing; Clustering algorithms; Multicore processing; Network topology; Schedules; Timing; Topology; Broadcast; Cluster; Fastest Edges First; Message Passing Interface (MPI); Multicore;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems (HPCC-ICESS), 2012 IEEE 14th International Conference on
Conference_Location :
Liverpool
Print_ISBN :
978-1-4673-2164-8
Type :
conf
DOI :
10.1109/HPCC.2012.46
Filename :
6332186
Link To Document :
بازگشت