Title :
Fast and lightweight support for nested parallelism on cluster-based embedded many-cores
Author :
Marongiu, Andrea ; Burgio, Paolo ; Benini, Luca
Author_Institution :
DEIS, Univ. of Bologna, Bologna, Italy
Abstract :
Several recent many-core accelerators have been architected as fabrics of tightly-coupled shared memory clusters. A hierarchical interconnection system is used - with a crossbar-like medium inside each cluster and a network-on-chip (NoC) at the global level - which make memory operations non-uniform (NUMA). Nested parallelism represents a powerful programming abstraction for these architectures, where a first level of parallelism can be used to distribute coarse-grained tasks to clusters, and additional levels of fine-grained parallelism can be distributed to processors within a cluster. This paper presents a lightweight and highly optimized support for nested parallelism on cluster-based embedded many-cores. We assess the costs to enable multi-level parallelization and demonstrate that our techniques allow to extract high degrees of parallelism.
Keywords :
embedded systems; network-on-chip; shared memory systems; NUMA; NoC; cluster based embedded manycores; fine grained parallelism; hierarchical interconnection system; manycore accelerators; memory operations nonuniform; nested parallelism; network-on-chip; programming abstraction; shared memory clusters; Arrays; Instruction sets; Parallel processing; Programming; Synchronization;
Conference_Titel :
Design, Automation & Test in Europe Conference & Exhibition (DATE), 2012
Conference_Location :
Dresden
Print_ISBN :
978-1-4577-2145-8
DOI :
10.1109/DATE.2012.6176441