Design of throughput-optimized arrays from recurrence abstractions

Author

Jacob, Arpith C. ; Buhler, Jeremy D. ; Chamberlain, Roger D.

Author_Institution

Dept. of Comput. Sci. & Eng., Washington Univ. in St. Louis, St. Louis, MO, USA

fYear

2010

fDate

7-9 July 2010

Firstpage

133

Lastpage

140

Abstract

Many compute-bound applications have seen order-of-magnitude speedups using special-purpose accelerators. FPGAs in particular are good at implementing recurrence equations realized as arrays. Existing high-level synthesis approaches for recurrence equations produce an array that is latency-space optimal. We target applications that operate on a large collection of small inputs, e.g. a database of biological sequences, where overall throughput is the most important measure of performance. In this work, we introduce a new design-space exploration procedure within the polyhedral framework to optimize throughput of a systolic array subject to area and bandwidth constraints of an FPGA device. Our approach is to exploit additional parallelism by pipelining multiple inputs on an array and multiple iteration vectors in a processing element. We prove that the throughput of an array is given by the inverse of the maximum number of iteration vectors executed by any processor in the array, which is determined solely by the array´s projection vector. We have applied this observation to discover novel arrays for Nussinov RNA folding. Our throughput-optimized array is 2× faster than the standard latency-space optimal array, yet it uses 15% fewer LUT resources. We achieve a further 2× speedup by processor pipelining, with only a 37% increase in resources. Our tool suggests additional arrays that trade area for throughput and are 4–5× faster than the currently used latency-optimized array. These novel arrays are 70–172× faster than a software baseline.

Keywords

Computer applications; Constraint optimization; Databases; Design optimization; Difference equations; Field programmable gate arrays; High level synthesis; Pipeline processing; Systolic arrays; Throughput; Dynamic Programming; FPGA; Recurrences; Systolic Array; Throughput Optimization;

fLanguage

English

Publisher

ieee

Conference_Titel

Application-specific Systems Architectures and Processors (ASAP), 2010 21st IEEE International Conference on

Conference_Location

Rennes, France

ISSN

2160-0511

Print_ISBN

978-1-4244-6966-6

Electronic_ISBN

2160-0511

Type

conf

DOI

10.1109/ASAP.2010.5540753

Filename

5540753