DocumentCode :
3588946
Title :
NestedMP: Taming Complex Configuration Space of Degree of Parallelism for Nested-Parallel Programs
Author :
Jiangzhou He ; Wenguang Chen ; Zhizhong Tang
Author_Institution :
Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
fYear :
2014
Firstpage :
339
Lastpage :
348
Abstract :
NestedMP: Taming Complex Configuration Space of Degree of Parallelism for Nested-Parallel Programs. It is beneficial to exploit multiple levels of parallelism for a wide range of applications, because a typical server already has tens of processor cores now. As the number of cores in a computer is increasing rapidly, efficient support of nested parallelism will be more important. However, compared to single-level parallelism, nested-parallelism is much more complicated for programming since its configuration space of degree of parallelism is more complicated. Nowadays parallel programming models such as OpenMP only have naive support for nested parallelism, and programmers need to specify number of threads for each parallel task explicitly to get a reasonable performance. Such method has two drawbacks. First, it is a complicated job to write code to figure out appropriate configurations for different environments and contexts. Second, the runtime system lacks sufficient global information about threads allocation to make optimal decision on task-core mapping, which easily causes significant performance loss. To deal with such problems, we propose NestedMP, a set of directives which extends OpenMP. NestedMP adopts a model that propagate available threads on task tree in a top-down way, which provides global information about threads allocation for runtime system when high level parallel tasks are launched, to help it make locality-aware task-core mapping decisions. On the other side, instead of configuring number of threads explicitly, programmers control that by policies defined in NestedMP. We have written a few benchmarks by NestedMP, which shows NestedMP makes the code more concise on most cases. We have implemented NestedMP in GCC 4.8.2 and tested the performance of these benchmarks on a 4-way 8-core SandyBridge server. The result shows NestedMP improves the performance significantly over GCC´s OpenMP implementation.
Keywords :
multi-threading; multiprocessing systems; parallel machines; 4-way 8-core SandyBridge server; GCC 4.8.2; NestedMP; OpenMP; complex configuration space; high level parallel task; locality-aware task-core mapping decision; nested-parallel programming model; programmer control; single-level parallelism degree; thread allocation; Context; Dynamic scheduling; Heuristic algorithms; Instruction sets; Parallel processing; Runtime; Servers;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Processing Workshops (ICCPW), 2014 43rd International Conference on
ISSN :
1530-2016
Type :
conf
DOI :
10.1109/ICPPW.2014.51
Filename :
7103469
Link To Document :
بازگشت