DocumentCode :
3048698
Title :
Exploiting memory customization in FPGA for 3D stencil computations
Author :
Shafiq, Muhammad ; Pericàs, Miquel ; De la Cruz, Raul ; Araya-Polo, Mauricio ; Navarro, Nacho ; Ayguadé, Eduard
Author_Institution :
Comput. Sci., Barcelona Supercomput. Center, Barcelona, Spain
fYear :
2009
fDate :
9-11 Dec. 2009
Firstpage :
38
Lastpage :
45
Abstract :
3D stencil computations are compute-intensive kernels often appearing in high-performance scientific and engineering applications. The key to efficiency in these memory-bound kernels is full exploitation of data reuse. This paper explores the design aspects for 3D-Stencil implementations that maximize the reuse of all input data on a FPGA architecture. The work focuses on the architectural design of 3D stencils with the form n × (n + 1) × n, where n = {2, 4, 6, 8, ...}. The performance of the architecture is evaluated using two design approaches, ¿Multi-Volume¿ and ¿Single-Volume¿. When n = 8, the designs achieve a sustained throughput of 55.5 GFLOPS in the ¿Single-Volume¿ approach and 103 GFLOPS in the ¿Multi-Volume¿ design approach in a 100-200 MHz multi-rate implementation on a Virtex-4 LX200 FPGA. This corresponds to a stencil data delivery of 1500 bytes/cycle and 2800 bytes/cycle respectively. The implementation is analyzed and compared to two CPU cache approaches and to the statically scheduled local stores on the IBM PowerXCell 8i. The FPGA approaches designed here achieve much higher bandwidth despite the FPGA device being the least recent of the chips considered. These numbers show how a custom memory organization can provide large data throughput when implementing 3D stencil kernels.
Keywords :
field programmable gate arrays; signal processing; 3D stencil computations; FPGA; IBM PowerXCell 8i; data reuse; memory customization; memory organization; memory-bound kernels; Bandwidth; Computer applications; Field programmable gate arrays; Finite difference methods; Finite impulse response filter; Hardware; Kernel; Nearest neighbor searches; Throughput; Time domain analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Field-Programmable Technology, 2009. FPT 2009. International Conference on
Conference_Location :
Sydney, NSW
Print_ISBN :
978-1-4244-4375-8
Electronic_ISBN :
978-1-4244-4377-2
Type :
conf
DOI :
10.1109/FPT.2009.5377644
Filename :
5377644
Link To Document :
بازگشت