DocumentCode :
1920348
Title :
A Hierarchical Approach for Load Balancing on Parallel Multi-core Systems
Author :
Pilla, Laércio L. ; Ribeiro, Christiane Pousa ; Cordeiro, Daniel ; Mei, Chao ; Bhatele, Abhinav ; Navaux, Philippe O A ; Broquedis, François ; Méhaut, Jean-François ; Kale, Laxmikant V.
Author_Institution :
Inst. of Inf., Fed. Univ. of Rio Grande do Sul, Porto Alegre, Brazil
fYear :
2012
fDate :
10-13 Sept. 2012
Firstpage :
118
Lastpage :
127
Abstract :
Multi-core compute nodes with non-uniform memory access (NUMA) are now a common architecture in the assembly of large-scale parallel machines. On these machines, in addition to the network communication costs, the memory access costs within a compute node are also asymmetric. Ignoring this can lead to an increase in the data movement costs. Therefore, to fully exploit the potential of these nodes and reduce data access costs, it becomes crucial to have a complete view of the machine topology (i.e. the compute node topology and the interconnection network among the nodes). Furthermore, the parallel application behavior has an important role in determining how to utilize the machine efficiently. In this paper, we propose a hierarchical load balancing approach to improve the performance of applications on parallel multi-core systems. We introduce NucoLB, a topology-aware load balancer that focuses on redistributing work while reducing communication costs among and within compute nodes. NucoLB takes the asymmetric memory access costs present on NUMA multi-core compute nodes, the interconnection network overheads, and the application communication patterns into account in its balancing decisions. We have implemented NucoLB using the Charm++ parallel runtime system and evaluated its performance. Results show that our load balancer improves performance up to 20% when compared to state-of-the-art load balancers on three different NUMA parallel machines.
Keywords :
cost reduction; digital storage; multiprocessing systems; multiprocessor interconnection networks; network topology; parallel architectures; parallel machines; resource allocation; CHARM++ parallel runtime system; NUCOLB; NUMA multicore compute nodes; NUMA parallel machines; asymmetric memory access cost; balancing decisions; data access costs reduction; data movement cost; hierarchical load balancing approach; interconnection network overheads; large-scale parallel machines; machine topology; network communication cost; nonuniform memory access cost; parallel application; parallel multicore systems; redistributing work; state-of-the-art load balancers; topology-aware load balancer; Benchmark testing; Load management; Multiprocessor interconnection; Network topology; Parallel machines; Runtime; Topology; cluster; load balancing; memory affinity; multi-core; non-uniform memory access; topology;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Processing (ICPP), 2012 41st International Conference on
Conference_Location :
Pittsburgh, PA
ISSN :
0190-3918
Print_ISBN :
978-1-4673-2508-0
Type :
conf
DOI :
10.1109/ICPP.2012.9
Filename :
6337573
Link To Document :
بازگشت