DocumentCode
2491287
Title
Blocking LU Decomposition for FPGAs
Author
Wu, Guiming ; Dou, Yong ; Peterson, Gregory D.
Author_Institution
Nat. Lab. for Parallel & Distrib. Process., Nat. Univ. of Defense Technol., Changsha, China
fYear
2010
fDate
2-4 May 2010
Firstpage
109
Lastpage
112
Abstract
To efficiently perform large matrix LU decomposition on FPGAs with limited local memory, the original algorithm needs to be blocked. In this paper, we propose a block LU decomposition algorithm for FPGAs, which is applicable for matrices of arbitrary size. We introduce a high performance hardware design, which mainly consists of a linear array of processing elements (PEs), to implement our block LU decomposition algorithm. A total of 36 PEs can be integrated into a Xilinx Virtex-5 xc5vlx330 FPGA on our self-designed PCI-Express card, reaching a sustained performance of 8.50 GFLOPS at 133 MHz, which outperforms previous work.
Keywords
field programmable gate arrays; matrix decomposition; 8.50 GFLOPS; FPGA; PCI-Express card; Xilinx Virtex-5 xc5vlx330; block LU decomposition algorithm; high performance hardware design; processing elements; Algorithm design and analysis; Computer applications; Concurrent computing; Distributed computing; Field programmable gate arrays; Hardware; Laboratories; Linear algebra; Matrix decomposition; Power engineering computing;
fLanguage
English
Publisher
ieee
Conference_Titel
Field-Programmable Custom Computing Machines (FCCM), 2010 18th IEEE Annual International Symposium on
Conference_Location
Charlotte, NC
Print_ISBN
978-0-7695-4056-6
Electronic_ISBN
978-1-4244-7143-0
Type
conf
DOI
10.1109/FCCM.2010.25
Filename
5474061
Link To Document