• DocumentCode
    3757196
  • Title

    A Distributed Memory Based Embedded CGRA for Accelerating Stencil Computations

  • Author

    Shohei Takeuchi;Yuttakon Yuttakonkit;Shinya Takamaeda-Yamazaki;Yasuhiko Nakashima

  • Author_Institution
    Grad. Sch. of Inf. Sci., Nara Inst. of Sci. &
  • fYear
    2015
  • Firstpage
    385
  • Lastpage
    391
  • Abstract
    Stencil computation is one of the basic but important operation patterns for various applications, such as image processing. Various GPU-based and application-specific hardware approaches have been recently proposed. However, available absolute energy capacity and hardware size are limited in embedded systems. Therefore, energy efficient, small footprint, and high performance accelerator is necessary for constructing an intelligent computation platform. We develop an embedded CGRA accelerator with distributed on-chip memory blocks for both energy-and memory-bandwidthefficient stencil computation. In this paper, we implemented a real LSI and its FPGA based evaluation platform by using Xilinx Zynq and Debian Linux. The evaluation result shows that the accelerator achieves 2.5x higher performance and 2.3x lower energy consumption, compared to ARM core with Zynq. We then estimated the performance and energy efficiency of the accelerator. The estimation result shows that the accelerator manufactured in 28nm process achieves 1.61x better energy efficiency than the mobile GPU.
  • Keywords
    "Hardware","Registers","Graphics processing units","Energy consumption","Jacobian matrices","Large scale integration","Computer architecture"
  • Publisher
    ieee
  • Conference_Titel
    Computing and Networking (CANDAR), 2015 Third International Symposium on
  • Electronic_ISBN
    2379-1896
  • Type

    conf

  • DOI
    10.1109/CANDAR.2015.110
  • Filename
    7424743