Title :
A Load-Distributed Linpack Implementation for Heterogeneous Clusters
Author :
David Rohr;Volker Lindenstruth
Author_Institution :
Frankfurt Inst. for Adv. Studies, Frankfurt, Germany
Abstract :
In recent years, heterogeneous HPC systems, whichcombine traditional processors with accelerator cards such as GPUs, have been shown to deliver superior performance and power efficiency. Since different scientific problems pose different demands on the computer architecture, some general purpose supercomputers consist of different types of nodes, where each type is suited best for certain applications. Such clusters with inter-node heterogeneity (different types of nodes) on top of intra-node heterogeneity (different processors inside one node) consist of compute nodes with different compute performances. The standard implementation of the Linpack benchmark, HPL, distributes the workload evenly among all processes and thus cannot exploit the cluster´s full potential if the nodes have unequalperformance. This paper presents a new feature of our HPL-GPU implementation which allows a balanced fine-tuned workload distribution among all compute nodes taking into account their individual compute capabilities. We present results on some nodes of different speed-grades on the LOEWE-CSC cluster and demonstrate that our implementation can utilize all nodes of a heterogeneous configuration efficiently showing only about 3% granularity loss.
Keywords :
"Graphics processing units","Benchmark testing","Supercomputers","Standards","Hardware","Niobium"
Conference_Titel :
High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS), 2015 IEEE 12th International Conferen on Embedded Software and Systems (ICESS), 2015 IEEE 17th International Conference on
DOI :
10.1109/HPCC-CSS-ICESS.2015.17