• DocumentCode
    3678617
  • Title

    Joint Circuit-System Design Space Exploration of Multiplier Unit Structure for Energy-Efficient Vector Processors

  • Author

    Ivan Ratkovic;Oscar Palomar;Milan Stanic;Milovan Duric;Djordje Peic;Osman Unsal;Adrian Cristal;Mateo Valero

  • fYear
    2015
  • fDate
    7/1/2015 12:00:00 AM
  • Firstpage
    19
  • Lastpage
    26
  • Abstract
    Although touted as a power and energy-efficient solution for workloads that exhibit data-level parallelism, vector processors were not explored sufficiently from a low power perspective in the past. Therefore, there is a need for explorations of vector computational units from a low power angle. Multimedia workloads that are suitable for vector processing (such as image processing) typically have the multiplication as a fundamental operation. In this paper, we perform a joint circuit-architecture design space exploration of the vector multiplier unit (VMU). For this exploration, we use various circuit- and architecture-level parameters (e.g. Multiplier family and maximum vector length), tools and simulators for a 40nm low power technology and the San Diego Vision Benchmark suite. We examine advantages and side effects of using multiple vector lanes and show how it performs across the frequency spectrum to achieve an energy-and thermal-efficient speed-up. As the final results of our exploration, we derive Pareto optimal VMU design points. Among other findings, our exploration reveals that Wallace VMU with 4 vector lanes and 2 pipeline stages is an optimal choice for fast and low power mobile vector processors, while single lane Carry-Save Array VMU is efficient for very low power and frequency requirements.
  • Keywords
    "Vector processors","Pipeline processing","Timing","Power dissipation","Space exploration","Benchmark testing"
  • Publisher
    ieee
  • Conference_Titel
    VLSI (ISVLSI), 2015 IEEE Computer Society Annual Symposium on
  • Type

    conf

  • DOI
    10.1109/ISVLSI.2015.23
  • Filename
    7308672