• DocumentCode
    726358
  • Title

    Revisiting accelerator-rich CMPs: Challenges and solutions

  • Author

    Teimouri, Nasibeh ; Tabkhi, Hamed ; Schirner, Gunar

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Northeastern Univ. Boston, Boston, MA, USA
  • fYear
    2015
  • fDate
    8-12 June 2015
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    Heterogeneous Chip Multi Processors (CMP)s, which combine processor cores with specialized HW accelerators, are one main approach to high-performance low-power computing. While it is promising for few accelerators, the scalability is a major challenge with increasing number of accelerators. Resources including memory, communication fabric and processor turn into bottlenecks and result in accelerator under-utilization and cripple the performance. This paper analyzes the scalability of heterogeneous CMPs with many accelerators and identifies bottlenecks and their impacts on system performance. It introduces an analytical method for scalability/bottleneck analysis that is backed up by a simulation-based performance analysis (using automatically generated virtual platforms). This paper proposes a novel architecture template: Transparent Self-Synchronizing (TSS) accelerators for efficient/scalable realization of streaming applications. TSS achieves the efficiency / scalability through configurable point-to-point connections and self synchronization between HW accelerators and efficient management of accelerator´s memory. This article demonstrates the TSS benefits using both analytical and simulation methods. TSS significantly reduces the pressure on the communication fabric, processor load, and memory requirements to improve scalability. Even with increasing number of accelerators, TSS can achieve more than 85% accelerator utilization. In contrast, in ACC-based CMPs the accelerator utilization drops fast; less than 40% with six accelerators or even worse with more accelerators. The scalability benefits of TSS are more pronounced as the number of hardware accelerators increases.
  • Keywords
    integrated circuit reliability; low-power electronics; microprocessor chips; ACC; TSS accelerators; accelerator-rich CMP; analytical method; automatically generated virtual platforms; bottleneck analysis; communication fabric; configurable point-to-point connections; heterogeneous chip multi processors; high-performance low-power computing; memory; memory requirements; processor cores; processor load; processor tum; scalability analysis; self synchronization; simulation-based performance analysis; specialized HW accelerators; streaming applications; transparent self-synchronizing accelerators; Fabrics; Logic gates; Ports (Computers); Runtime; Synchronization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Design Automation Conference (DAC), 2015 52nd ACM/EDAC/IEEE
  • Conference_Location
    San Francisco, CA
  • Type

    conf

  • DOI
    10.1145/2744769.2744902
  • Filename
    7167268