• DocumentCode
    680062
  • Title

    Energy-efficient large-scale matrix multiplication on FPGAs

  • Author

    Matam, Kiran Kumar ; Prasanna, Viktor K.

  • Author_Institution
    Comput. Sci. Dept., Univ. of Southern California, Los Angeles, CA, USA
  • fYear
    2013
  • fDate
    9-11 Dec. 2013
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    Energy efficiency has emerged as one of the key performance metrics in computing. In this work, we present an energy efficient design for large-scale matrix multiplication. As a baseline architecture, we use a highly optimized on-chip matrix multiplication architecture extended to support large matrices using external memory. Based on the matrix multiplication algorithm and the DRAM model, we present an efficient data layout for storing the input matrices. This data layout reduces the energy consumed by the external memory by minimizing the number of row activations in a DRAM. By exploiting the matrix multiplication algorithm, modular structure of the DRAM, and the high bandwidth between the on-chip and the external memory, we propose a memory activation schedule. This memory activation schedule is based on a realistic DRAM model and reduces the memory energy, which is the dominant energy of the design. Our proposed scheme improves the energy efficiency (defined as the number of operations per Joule) of the baseline architecture by 1.6×, 1.3×, and 1.2× for 32K×32K 16-bit fixed point, 32K×32K single precision floating point, and 16K×16K double precision floating point matrix multiplication, respectively.
  • Keywords
    DRAM chips; field programmable gate arrays; fixed point arithmetic; floating point arithmetic; mathematics computing; matrix multiplication; performance evaluation; power aware computing; FPGA; baseline architecture; data layout; double precision floating point matrix multiplication; efficient data layout; energy consumption; energy efficient design; energy-efficient large-scale matrix multiplication; external memory; fixed point matrix multiplication; highly optimized on-chip matrix multiplication architecture; memory activation schedule; memory energy; modular DRAM structure; performance metrics; realistic DRAM model; single precision floating point matrix multiplication; Algorithm design and analysis; DRAM chips; Field programmable gate arrays; Layout; Schedules; System-on-chip;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Reconfigurable Computing and FPGAs (ReConFig), 2013 International Conference on
  • Conference_Location
    Cancun
  • Print_ISBN
    978-1-4799-2078-5
  • Type

    conf

  • DOI
    10.1109/ReConFig.2013.6732284
  • Filename
    6732284