• DocumentCode
    1325822
  • Title

    FPGA accelerator for floating-point matrix multiplication

  • Author

    Jovanovic, Zoran ; Milutinovic

  • Author_Institution
    University of Belgrade, Serbia
  • Volume
    6
  • Issue
    4
  • fYear
    2012
  • fDate
    7/1/2012 12:00:00 AM
  • Firstpage
    249
  • Lastpage
    256
  • Abstract
    This study treats architecture and implementation of a field-programmable gate array (FPGA) accelerator for doubleprecision floating-point matrix multiplication. The architecture is oriented towards minimising resource utilisation and maximising clock frequency. It employs the block matrix multiplication algorithm which returns the result blocks to the host processor as soon as they are computed. This avoids output buffering and simplifies placement and routing on the chip. The authors show that such architecture is especially well suited for full-duplex communication links between the accelerator and the host processor. The architecture requires the result blocks to be accumulated by the host processor; however, the authors show that typically more than 99% of all arithmetic operations are performed by the accelerator. The implementation focuses on efficient use of embedded FPGA resources, in order to allow for a large number of processing elements (PEs). Each PE uses eight Virtex-6 DSP blocks. Both adders and multipliers are deeply pipelined and use several FPGA-specific techniques to achieve small area size and high clock frequency. Finally, the authors quantify the performance of accelerator implemented in Xilinx Virtex-6 FPGA, with 252 PEs running at 403 MHz (achieving 203.1 Giga FLOPS (GFLOPS)), by comparing it to double-precision matrix multiplication function from MKL, ACML, GotoBLAS and ATLAS libraries executing on Intel Core2Quad and AMD Phenom X4 microprocessors running at 2.8 GHz. The accelerator performs 4.5 times faster than the fastest processor/library pair.
  • fLanguage
    English
  • Journal_Title
    Computers & Digital Techniques, IET
  • Publisher
    iet
  • ISSN
    1751-8601
  • Type

    jour

  • DOI
    10.1049/iet-cdt.2011.0132
  • Filename
    6337379