DocumentCode
3757196
Title
A Distributed Memory Based Embedded CGRA for Accelerating Stencil Computations
Author
Shohei Takeuchi;Yuttakon Yuttakonkit;Shinya Takamaeda-Yamazaki;Yasuhiko Nakashima
Author_Institution
Grad. Sch. of Inf. Sci., Nara Inst. of Sci. &
fYear
2015
Firstpage
385
Lastpage
391
Abstract
Stencil computation is one of the basic but important operation patterns for various applications, such as image processing. Various GPU-based and application-specific hardware approaches have been recently proposed. However, available absolute energy capacity and hardware size are limited in embedded systems. Therefore, energy efficient, small footprint, and high performance accelerator is necessary for constructing an intelligent computation platform. We develop an embedded CGRA accelerator with distributed on-chip memory blocks for both energy-and memory-bandwidthefficient stencil computation. In this paper, we implemented a real LSI and its FPGA based evaluation platform by using Xilinx Zynq and Debian Linux. The evaluation result shows that the accelerator achieves 2.5x higher performance and 2.3x lower energy consumption, compared to ARM core with Zynq. We then estimated the performance and energy efficiency of the accelerator. The estimation result shows that the accelerator manufactured in 28nm process achieves 1.61x better energy efficiency than the mobile GPU.
Keywords
"Hardware","Registers","Graphics processing units","Energy consumption","Jacobian matrices","Large scale integration","Computer architecture"
Publisher
ieee
Conference_Titel
Computing and Networking (CANDAR), 2015 Third International Symposium on
Electronic_ISBN
2379-1896
Type
conf
DOI
10.1109/CANDAR.2015.110
Filename
7424743
Link To Document