• DocumentCode
    774560
  • Title

    Energy- and time-efficient matrix multiplication on FPGAs

  • Author

    Jang, Ju-wook ; Choi, Seonil B. ; Prasanna, Viktor K.

  • Author_Institution
    Dept. of Electron. Eng., Sogang Univ., Seoul, South Korea
  • Volume
    13
  • Issue
    11
  • fYear
    2005
  • Firstpage
    1305
  • Lastpage
    1319
  • Abstract
    We develop new algorithms and architectures for matrix multiplication on configurable devices. These have reduced energy dissipation and latency compared with the state-of-the-art field-programmable gate array (FPGA)-based designs. By profiling well-known designs, we identify "energy hot spots", which are responsible for most of the energy dissipation. Based on this, we develop algorithms and architectures that offer tradeoffs among the number of I/O ports, the number of registers, and the number of PEs. To avoid time-consuming low-level simulations for energy profiling and performance prediction of many alternate designs, we derive functions to represent the impact of algorithm design choices on the system-wide energy dissipation, area, and latency. These functions are used to either optimize the energy performance or provide tradeoffs for a family of candidate algorithms and architectures. For selected designs, we perform extensive low-level simulations using state-of-the-art tools and target FPGA devices. We show a design space for matrix multiplication on FPGAs that results in tradeoffs among energy, area, and latency. For example, our designs improve the energy performance of state-of-the-art FPGA-based designs by 29%-51% without any increase in the area-latency product. The latency of our designs is reduced one-third to one-fifteenth while area is increased 1.9-9.4 times. In terms of comprehensive metrics such as Energy-Area-Time, our designs exhibit superior performance compared with the state-of-the-art by 50%-79%.
  • Keywords
    field programmable gate arrays; integrated circuit design; logic design; matrix multiplication; FPGA device; configurable hardware; energy hot spots; energy profiling; energy-delay tradeoff; energy-efficient matrix multiplication; field-programmable gate array; linear array; system-wide energy dissipation; time-efficient matrix multiplication; Algorithm design and analysis; Delay; Energy dissipation; Field programmable gate arrays; Hardware; Image processing; Mobile computing; Predictive models; Signal processing; Signal processing algorithms; Algorithm design; configurable hardware; energy-delay tradeoff; field-programmable gate array (FPGA); linear array; matrix multiplication; performance estimation;
  • fLanguage
    English
  • Journal_Title
    Very Large Scale Integration (VLSI) Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1063-8210
  • Type

    jour

  • DOI
    10.1109/TVLSI.2005.859562
  • Filename
    1564083