• DocumentCode
    1958471
  • Title

    High performance and memory efficient implementation of matrix multiplication on FPGAs

  • Author

    Wu, Guiming ; Dou, Yong ; Wang, Miao

  • Author_Institution
    Nat. Lab. for Parallel & Distrib. Process., Nat. Univ. of Defense Technol., Changsha, China
  • fYear
    2010
  • fDate
    8-10 Dec. 2010
  • Firstpage
    134
  • Lastpage
    137
  • Abstract
    We present a high performance and memory efficient hardware implementation of matrix multiplication for dense matrices of any size on the FPGA devices. By applying a series of transformations and optimizations on the original serial algorithm, we can obtain an I/O and memory optimized block algorithm for matrix multiplication on FPGAs. A linear array of processing elements (PEs) is proposed to implement this block algorithm. We show significant reduction in hardware resources consuming compared to the related work while increasing clock frequency. Moreover, the memory requirement can be reduced to O(S) from O(S2), where S is the block size. Therefore, more PEs can be integrated into the same FPGA devices.
  • Keywords
    circuit optimisation; field programmable gate arrays; matrix multiplication; FPGA devices; clock frequency; matrix multiplication; memory optimized block algorithm; processing elements; serial algorithm; Algorithm design and analysis; Arrays; Field programmable gate arrays; Hardware; Memory management; Optimization; Random access memory;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Field-Programmable Technology (FPT), 2010 International Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4244-8980-0
  • Type

    conf

  • DOI
    10.1109/FPT.2010.5681769
  • Filename
    5681769