• DocumentCode
    1532980
  • Title

    Codesign Tradeoffs for High-Performance, Low-Power Linear Algebra Architectures

  • Author

    Pedram, Ardavan ; Van de Geijn, Robert A. ; Gerstlauer, Andreas

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of Texas at Austin, Austin, TX, USA
  • Volume
    61
  • Issue
    12
  • fYear
    2012
  • Firstpage
    1724
  • Lastpage
    1736
  • Abstract
    As technology is reaching physical limits, reducing power consumption is a key issue on our path to sustained performance. In this paper, we study fundamental tradeoffs and limits in efficiency (as measured in energy per operation) that can be achieved for an important class of kernels, namely the level-3 Basic Linear Algebra Subprograms (BLAS). It is well-accepted that specialization is the key to efficiency. This paper establishes a baseline by studying GEneral Matrix-matrix Multiplication (GEMM) on a variety of custom and general-purpose CPU and GPU architectures. Our analysis shows that orders of magnitude improvements in efficiency are possible with relatively simple customizations and fine-tuning of memory hierarchy configurations. We argue that these customizations can be generalized to perform other representative linear algebra operations. In addition to exposing the sources of inefficiencies in current CPUs and GPUs, our results show our prototype Linear Algebra Processor (LAP) implementing Double-precision GEMM (DGEMM) can achieve 600 GFLOPS while consuming less than 25 Watts in standard 45 nm technology, which is up to 50 × more energy efficient than cutting-edge CPUs.
  • Keywords
    graphics processing units; linear algebra; low-power electronics; matrix multiplication; memory architecture; power consumption; BLAS; DGEMM; GFLOPS; GPU architecture; LAP; codesign tradeoff; double-precision GEMM; general matrix-matrix multiplication; general-purpose CPU; high-performance linear algebra architecture; level-3 basic linear algebra subprograms; linear algebra operation; linear algebra processor; low-power linear algebra architecture; memory hierarchy configuration; power consumption; size 45 nm; Algorithm design and analysis; Bandwidth; Energy efficiency; Energy management; Field programmable gate arrays; Linear algebra; Low power electronics; Memory management; System-on-a-chip; Low-power design; energy-aware systems; level-3 BLAS; matrix multiplication; memory hierarchy; performance analysis and design aids; special-purpose hardware;
  • fLanguage
    English
  • Journal_Title
    Computers, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9340
  • Type

    jour

  • DOI
    10.1109/TC.2012.132
  • Filename
    6212466