DocumentCode
124012
Title
Heterogeneous dataflow architectures for FPGA-based sparse LU factorization
Author
Siddhartha ; Kapre, Nachiket
Author_Institution
Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore, Singapore
fYear
2014
fDate
2-4 Sept. 2014
Firstpage
1
Lastpage
4
Abstract
FPGA-based token dataflow architectures with heterogeneous computation and communication subsystems can accelerate hard-to-parallelize, irregular computations in sparse LU factorization. We combine software pre-processing and architecture customization to fully expose and exploit the underlying heterogeneity in the factorization algorithm. We perform a one-time pre-processing of the sparse matrices in software to generate dataflow graphs that capture raw parallelism in the computation through substitution and reassociation transformations. We customize the dataflow architecture by picking the right mixture of addition and multiplication processing elements to match the observed balance in the dataflow graphs. Additionally, we modify the network-on-chip to route certain critical dependencies on a separate, faster communication channel while relegating less-critical traffic to the existing channels. Using our techniques, we show how to achieve speedups of up to 37% over existing state-of-the-art FPGA-based sparse LU factorization systems that can already run 3-4× faster than CPU-based sparse LU solvers using the same hardware constraints.
Keywords
data flow graphs; field programmable gate arrays; matrix decomposition; network-on-chip; reconfigurable architectures; sparse matrices; CPU-based sparse LU solvers; FPGA-based sparse LU factorization algorithm; FPGA-based token dataflow architectures; architecture customization; communication channel; communication subsystems; dataflow graphs; hardware constraints; heterogeneous computation; heterogeneous dataflow architectures; multiplication processing elements; network-on-chip; raw parallelism; software pre-processing; sparse matrices; Benchmark testing; Computer architecture; Field programmable gate arrays; Hardware; Optimization; Parallel processing; Sparse matrices;
fLanguage
English
Publisher
ieee
Conference_Titel
Field Programmable Logic and Applications (FPL), 2014 24th International Conference on
Conference_Location
Munich
Type
conf
DOI
10.1109/FPL.2014.6927401
Filename
6927401
Link To Document