DocumentCode :
3697006
Title :
A Load-Distributed Linpack Implementation for Heterogeneous Clusters
Author :
David Rohr;Volker Lindenstruth
Author_Institution :
Frankfurt Inst. for Adv. Studies, Frankfurt, Germany
fYear :
2015
Firstpage :
436
Lastpage :
443
Abstract :
In recent years, heterogeneous HPC systems, whichcombine traditional processors with accelerator cards such as GPUs, have been shown to deliver superior performance and power efficiency. Since different scientific problems pose different demands on the computer architecture, some general purpose supercomputers consist of different types of nodes, where each type is suited best for certain applications. Such clusters with inter-node heterogeneity (different types of nodes) on top of intra-node heterogeneity (different processors inside one node) consist of compute nodes with different compute performances. The standard implementation of the Linpack benchmark, HPL, distributes the workload evenly among all processes and thus cannot exploit the cluster´s full potential if the nodes have unequalperformance. This paper presents a new feature of our HPL-GPU implementation which allows a balanced fine-tuned workload distribution among all compute nodes taking into account their individual compute capabilities. We present results on some nodes of different speed-grades on the LOEWE-CSC cluster and demonstrate that our implementation can utilize all nodes of a heterogeneous configuration efficiently showing only about 3% granularity loss.
Keywords :
"Graphics processing units","Benchmark testing","Supercomputers","Standards","Hardware","Niobium"
Publisher :
ieee
Conference_Titel :
High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS), 2015 IEEE 12th International Conferen on Embedded Software and Systems (ICESS), 2015 IEEE 17th International Conference on
Type :
conf
DOI :
10.1109/HPCC-CSS-ICESS.2015.17
Filename :
7336200
Link To Document :
بازگشت