Title :
An Approach for Energy Efficient Execution of Hybrid Parallel Programs
Author :
Ramapantulu, Lavanya ; Loghin, Dumitrel ; Yong Meng Teo
Author_Institution :
Dept. of Comput. Sci., Nat. Univ. of Singapore, Singapore, Singapore
Abstract :
Hybrid programming model is becoming increasingly popular for HPC applications as it has the dual-advantage of exploiting inter-node distributed-memory scalability and intra-node shared-memory performance in a cluster system. One of the key challenges for energy efficient execution of hybrid programs is to determine time and energy efficient hardware configurations among a large system configuration space. Given a hybrid program with an execution time deadline and an energy budget, we propose a measurement-based analytical modelling approach to determine these system configurations. In contrast to current approaches, we model both inter and intra-node resource overlaps, memory contention among cores within a node and network contention across multiple nodes. The model invalidated against direct measurement using five representative HPC applications on Intel Xeon and ARM clusters having diverse time-energy performance. We show that a Pareto frontier consisting of optimal configurations exist for a hybrid program running on homogeneous clusters. To further optimize the Pareto frontier, we introduce a new metric, useful computation ratio (UCR) to quantify the degree of resource contentions and communication overheads in an execution. We discuss how UCR and Pareto-optimal configurations can be used in conjunction by system´s designers to gain further insights into system resource imbalances, and how application developers can further fine-tune their hybrid programs.
Keywords :
Pareto optimisation; distributed shared memory systems; parallel programming; power aware computing; ARM clusters; HPC applications; Intel Xeon; Pareto frontier; Pareto-optimal configurations; UCR; energy efficient execution; energy efficient hardware configurations; hybrid parallel programs; hybrid programming model; internode distributed-memory scalability; intranode shared-memory performance; measurement-based analytical modeling approach; time efficient hardware configurations; time-energy performance; useful computation ratio; Analytical models; Clocks; Computational modeling; Current measurement; Hardware; Mathematical model; Memory management; MPI; OpenMP; Pareto-frontier; analytical model; hybrid program; inter-node; intra-node; overlap; resource contention; useful computation ratio;
Conference_Titel :
Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International
Conference_Location :
Hyderabad
DOI :
10.1109/IPDPS.2015.71