Title :
FPGA-Based High-Performance and Scalable Block LU Decomposition Architecture
Author :
Jaiswal, Manish Kumar ; Chandrachoodan, Nitin
Author_Institution :
ICFAI Univ., Dehradun, India
Abstract :
Decomposition of a matrix into lower and upper triangular matrices (LU decomposition) is a vital part of many scientific and engineering applications, and the block LU decomposition algorithm is an approach well suited to parallel hardware implementation. This paper presents an approach to speed up implementation of the block LU decomposition algorithm using FPGA hardware. Unlike most previous approaches reported in the literature, the approach does not assume the matrix can be stored entirely on chip. The memory accesses are studied for various FPGA configurations, and a schedule of operations for scaling well is shown. The design has been synthesized for FPGA targets and can be easily retargeted. The design outperforms previous hardware implementations, as well as tuned software implementations including the ATLAS and MKL libraries on workstations.
Keywords :
field programmable gate arrays; logic design; matrix decomposition; FPGA configurations; FPGA-based high-performance; lower triangular matrices; matrix decomposition; scalable block LU decomposition architecture; upper triangular matrices; Algorithm design and analysis; Field programmable gate arrays; Hardware; Matrix decomposition; Memory management; Parallel processing; ATLAS; FPGA; GPU.; Intel-MKL; LU decomposition; block LU; floating point arithmetics; hardware acceleration; scaling; single/double precision;
Journal_Title :
Computers, IEEE Transactions on