Title :
High-Level-Synthesis extensions for scalable Single-Chip Many-Accelerators on FPGAs
Author :
Dionysios Diamantopoulos;Sotirios Xydis;Kostas Siozios;Dimitrios Soudris
Author_Institution :
School of Electrical and Computer Engineering, National Technical University of Athens, Greece
Abstract :
Accelerator-coupled systems have been introduced as a promising architectural paradigm that can boost performance and improve power of general-purpose computing platforms. This research focuses on the accelerators´ scalability problem due to resource under-utilization in FPGA-based accelerator-coupled platforms. By recognizing that static memory allocation the de-facto memory management mechanism supported by modern design techniques and synthesis tools forms the main source of memory-induced under-utilization, i.e. leading up to 75% of dark silicon, we propose the development of a) a Single-Chip Many-Accelerator (SCMA) architecture that reduces energy budget by providing high-throughput processing nodes hooked under the same low-latency FPGA die and b) a novel design framework that extends conventional RTL and High Level Synthesis (HLS) design flows with dynamic memory management (DMM) features to leverage scalability by enabling accelerators to dynamically adapt their allocated memory to the runtime memory requirements, thus maximizing the overall accelerator count through effective sharing of FPGA´s memories resources. By applying these novel techniques in the state-of-art Vivado-HLS tool, we managed to increase accelerator density up to 3.8× for a Xilinx Ultrascale device and deliver architecture solutions that trade-off per-accelerator latency overhead (1.2×- 19.9×) with overall system´s throughput (2.6×- 23.1×) and performance-per-watt (0.09×- 21.7×).
Keywords :
"Field programmable gate arrays","Memory management","Resource management","Throughput","Silicon","System-on-chip"
Conference_Titel :
Field Programmable Logic and Applications (FPL), 2015 25th International Conference on
DOI :
10.1109/FPL.2015.7293992