مرکز منطقه ای اطلاع رساني علوم و فناوري - Multilevel Granularity Parallelism Synthesis on FPGAs

DocumentCode :

3183790

Title :

Multilevel Granularity Parallelism Synthesis on FPGAs

Author :

Papakonstantinou, Alexandros ; Liang, Yun ; Stratton, John A. ; Gururaj, Karthik ; Chen, Deming ; Hwu, Wen-Mei W. ; Cong, Jason

Author_Institution :

Electr. & Comput. Eng. Dept., Univ. of Illinois, Urbana, IL, USA

fYear :

2011

fDate :

1-3 May 2011

Firstpage :

178

Lastpage :

185

Abstract :

Recent progress in High-Level Synthesis (HLS) techniques has helped raise the abstraction level of FPGA programming. However implementation and performance evaluation of the HLS-generated RTL, involves lengthy logic synthesis and physical design flows. Moreover, mapping of different levels of coarse grained parallelism onto hardware spatial parallelism affects the final FPGA-based performance both in terms of cycles and frequency. Evaluation of the rich design space through the full implementation flow - starting with high level source code and ending with routed net list - is prohibitive in various scientific and computing domains, thus hindering the adoption of reconfigurable computing. This work presents a framework for multilevel granularity parallelism exploration with HLS-order of efficiency. Our framework considers different granularities of parallelism for mapping CUDA kernels onto high performance FPGA-based accelerators. We leverage resource and clock period models to estimate the impact of multi-granularity parallelism extraction on execution cycles and frequency. The proposed Multilevel Granularity Parallelism Synthesis (ML-GPS) framework employs an efficient design space search heuristic in tandem with the estimation models as well as design layout information to derive a performance near-optimal configuration. Our experimental results demonstrate that ML-GPS can efficiently identify and generate CUDA kernel configurations that can significantly outperform previous related tools whereas it can offer competitive performance compared to software kernel execution on GPUs at a fraction of the energy cost.

Keywords :

field programmable gate arrays; integrated circuit layout; logic design; CUDA kernel mapping; FPGA programming; FPGA-based accelerator; abstraction level; coarse grained parallelism; design layout information; design space search heuristic; hardware spatial parallelism; high-level synthesis technique; lengthy logic synthesis; multigranularity parallelism extraction; multilevel granularity parallelism synthesis; performance evaluation; physical design flow; reconfigurable computing; Arrays; Clocks; Estimation; Field programmable gate arrays; Instruction sets; Kernel; Parallel processing; Design Space Exploration; FPGA; High-Level Sytnthesis; Parallel Computing;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Field-Programmable Custom Computing Machines (FCCM), 2011 IEEE 19th Annual International Symposium on

Conference_Location :

Salt Lake City, UT

Print_ISBN :

978-1-61284-277-6

Electronic_ISBN :

978-0-7695-4301-7

Type :

conf

DOI :

10.1109/FCCM.2011.29

Filename :

5771270

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3183790