Title :
Load balancing, broadcast, and scatter primitives for efficient multicore applications
Author :
Miltos D. Grammatikakis;Antonis Papagrigoriou;Polydoros Petrakis;Kostas Harteros;George Kornaros
Author_Institution :
Technological Educational Institute of Crete, Heraklion, Greece
Abstract :
Efficient parallel execution of scientific and transaction-oriented applications requires reducing communication/synchronization overheads by improving locality using explicit methods that capturet underlying access patterns. In this work, we propose low-cost hardware that supports load balancing and parallel broadcast/scatter macro-operations. We evaluate these primitives using a cycle-accurate SystemC virtual platform of a multicore System-on-Chip (SoC) that interconnects cycle-accurate processor models (Cortex-A9) and a memory hierarchy via a hypercube Network-on-Chip (NoC). Results from executing a typical parallel matrix multiplication benchmark on a small-range embedded multicore SoC, indicate average execution time improvements of 25% for load balancing, 21% for broadcast/scatter primitives and 50% collectively, when utilizing both primitives. While load balancing relies only on remote shared-memory access principles, synthesis on Zedboard´s Zynq 7020 FPGA indicates a very low area cost for scatter operation compared to an industrial DMA-based scatter/gather solution.
Keywords :
"Instruction sets","Load management","Multicore processing","Hardware","Monitoring","Arrays","Message systems"
Conference_Titel :
Intelligent Solutions in Embedded Systems (WISES), 2015 12th International Workshop on