مرکز منطقه ای اطلاع رساني علوم و فناوري - Breaking Sequential Dependencies in FPGA-Based Sparse LU Factorization

DocumentCode :

188119

Title :

Breaking Sequential Dependencies in FPGA-Based Sparse LU Factorization

Author :

Siddhartha ; Kapre, Nachiket

Author_Institution :

Nanyang Technol. Univ., Singapore, Singapore

fYear :

2014

fDate :

11-13 May 2014

Firstpage :

Lastpage :

Abstract :

Substitution, and reassociation of irregular sparse LU factorization can deliver up to 31% additional speedup over an existing state-of-the-art parallel FPGA implementation where further parallelization was deemed virtually impossible. The state-of-the-art implementation is already capable of delivering 3× acceleration over CPU-based sparse LU solvers. Sparse LU factorization is a well-known computational bottleneck in many existing scientific and engineering applications and is notoriously hard to parallelize due to inherent sequential dependencies in the computation graph. In this paper, we show how to break these alleged inherent dependencies using depth-limited substitution, and reassociation of the resulting computation. This is a work-parallelism tradeoff that is well-suited for implementation on FPGA-based token dataflow architectures. Such compute organizations are capable of fast parallel processing of large irregular graphs extracted from the sparse LU computation. We manage and control the growth in additional work due to substitution through careful selection of substitution depth. We exploit associativity in the generated graphs to restructure long compute chains into reduction trees.

Keywords :

data flow computing; field programmable gate arrays; mathematics computing; matrix decomposition; parallel architectures; sparse matrices; trees (mathematics); CPU-based sparse LU solvers; FPGA-based sparse LU factorization; FPGA-based token dataflow architectures; depth-limited substitution; engineering applications; irregular graphs; irregular sparse LU factorization reassociation; parallel FPGA implementation; parallel processing; reduction trees; scientific applications; sequential dependencies; sparse LU computation; Benchmark testing; Circuit simulation; Computer architecture; Field programmable gate arrays; Hardware; Parallel processing; Sparse matrices; LU Factorization; reassociation; substitution;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Field-Programmable Custom Computing Machines (FCCM), 2014 IEEE 22nd Annual International Symposium on

Conference_Location :

Boston, MA

Print_ISBN :

978-1-4799-5110-9

Type :

conf

DOI :

10.1109/FCCM.2014.26

Filename :

6861588

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=188119