• DocumentCode
    2491287
  • Title

    Blocking LU Decomposition for FPGAs

  • Author

    Wu, Guiming ; Dou, Yong ; Peterson, Gregory D.

  • Author_Institution
    Nat. Lab. for Parallel & Distrib. Process., Nat. Univ. of Defense Technol., Changsha, China
  • fYear
    2010
  • fDate
    2-4 May 2010
  • Firstpage
    109
  • Lastpage
    112
  • Abstract
    To efficiently perform large matrix LU decomposition on FPGAs with limited local memory, the original algorithm needs to be blocked. In this paper, we propose a block LU decomposition algorithm for FPGAs, which is applicable for matrices of arbitrary size. We introduce a high performance hardware design, which mainly consists of a linear array of processing elements (PEs), to implement our block LU decomposition algorithm. A total of 36 PEs can be integrated into a Xilinx Virtex-5 xc5vlx330 FPGA on our self-designed PCI-Express card, reaching a sustained performance of 8.50 GFLOPS at 133 MHz, which outperforms previous work.
  • Keywords
    field programmable gate arrays; matrix decomposition; 8.50 GFLOPS; FPGA; PCI-Express card; Xilinx Virtex-5 xc5vlx330; block LU decomposition algorithm; high performance hardware design; processing elements; Algorithm design and analysis; Computer applications; Concurrent computing; Distributed computing; Field programmable gate arrays; Hardware; Laboratories; Linear algebra; Matrix decomposition; Power engineering computing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Field-Programmable Custom Computing Machines (FCCM), 2010 18th IEEE Annual International Symposium on
  • Conference_Location
    Charlotte, NC
  • Print_ISBN
    978-0-7695-4056-6
  • Electronic_ISBN
    978-1-4244-7143-0
  • Type

    conf

  • DOI
    10.1109/FCCM.2010.25
  • Filename
    5474061