DocumentCode
726358
Title
Revisiting accelerator-rich CMPs: Challenges and solutions
Author
Teimouri, Nasibeh ; Tabkhi, Hamed ; Schirner, Gunar
Author_Institution
Dept. of Electr. & Comput. Eng., Northeastern Univ. Boston, Boston, MA, USA
fYear
2015
fDate
8-12 June 2015
Firstpage
1
Lastpage
6
Abstract
Heterogeneous Chip Multi Processors (CMP)s, which combine processor cores with specialized HW accelerators, are one main approach to high-performance low-power computing. While it is promising for few accelerators, the scalability is a major challenge with increasing number of accelerators. Resources including memory, communication fabric and processor turn into bottlenecks and result in accelerator under-utilization and cripple the performance. This paper analyzes the scalability of heterogeneous CMPs with many accelerators and identifies bottlenecks and their impacts on system performance. It introduces an analytical method for scalability/bottleneck analysis that is backed up by a simulation-based performance analysis (using automatically generated virtual platforms). This paper proposes a novel architecture template: Transparent Self-Synchronizing (TSS) accelerators for efficient/scalable realization of streaming applications. TSS achieves the efficiency / scalability through configurable point-to-point connections and self synchronization between HW accelerators and efficient management of accelerator´s memory. This article demonstrates the TSS benefits using both analytical and simulation methods. TSS significantly reduces the pressure on the communication fabric, processor load, and memory requirements to improve scalability. Even with increasing number of accelerators, TSS can achieve more than 85% accelerator utilization. In contrast, in ACC-based CMPs the accelerator utilization drops fast; less than 40% with six accelerators or even worse with more accelerators. The scalability benefits of TSS are more pronounced as the number of hardware accelerators increases.
Keywords
integrated circuit reliability; low-power electronics; microprocessor chips; ACC; TSS accelerators; accelerator-rich CMP; analytical method; automatically generated virtual platforms; bottleneck analysis; communication fabric; configurable point-to-point connections; heterogeneous chip multi processors; high-performance low-power computing; memory; memory requirements; processor cores; processor load; processor tum; scalability analysis; self synchronization; simulation-based performance analysis; specialized HW accelerators; streaming applications; transparent self-synchronizing accelerators; Fabrics; Logic gates; Ports (Computers); Runtime; Synchronization;
fLanguage
English
Publisher
ieee
Conference_Titel
Design Automation Conference (DAC), 2015 52nd ACM/EDAC/IEEE
Conference_Location
San Francisco, CA
Type
conf
DOI
10.1145/2744769.2744902
Filename
7167268
Link To Document