• DocumentCode
    3264609
  • Title

    A centralized cache miss driven technique to improve processor power dissipation

  • Author

    Homayoun, Houman ; Makhzan, Mohammad ; Gaudiot, Jean-Luc ; Veidenbaum, Alex

  • Author_Institution
    Dept. of Electr. & Comput. Eng., UC, Irvine, CA
  • fYear
    2008
  • fDate
    21-24 July 2008
  • Firstpage
    195
  • Lastpage
    202
  • Abstract
    Leakage and dynamic power are a major challenge in microprocessor design. Many circuit techniques along with micro-architectural innovations have been proposed to reduce power in individual processor units. But it is not clear that these techniques can be combined. A centralized approach which can reduce power in more than one unit at a time with minimal the hardware overhead is needed. This paper proposes such a centralized approach that attempts to simultaneously reduce power in processor units with highest dissipation: the reorder buffer, the instruction queue, and the integer and the floating-point register files. It is based on an observation that utilization for the aforementioned units varies significantly, during a period when an L2 cache miss or multiple L1 cache misses are pending as compared to a period when none of these are present. Therefore we propose to dynamically adjust the size and thus power dissipation of these resources during such periods. Circuit level modifications required for such resource adaptation are presented. Simulation results for SPEC2K benchmarks show a substantial reduction in both leakage and dynamic power: the total dynamic power is reduced by as much as 30, 31, 31 and 48% for the reorder buffer, the integer register file, the floating-point register file and the instruction queue, respectively. The reduction in leakage is up to 33% for reorder buffer and 37% for integer and floating-point register files. The total energy-delay product is reduced, on average, by 15, 26, 20 and 17% for the reorder buffer, the integer register file, the floating-point register file and the instruction queue respectively. This comes at the cost of a performance impact which is as low as 0.9% for integer and 2.2% for floating-point benchmarks. The required hardware modification is shown to be minimal.
  • Keywords
    cache storage; floating point arithmetic; microprocessor chips; SPEC2K benchmarks; centralized approach; centralized cache miss driven technique; circuit level modifications; floating-point register files; instruction queue; microprocessor design; processor power dissipation; reorder buffer; resource adaptation; Circuits; Computer science; Hardware; Monitoring; Multicore processing; Performance loss; Power dissipation; Registers; Resource management; Voltage;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Embedded Computer Systems: Architectures, Modeling, and Simulation, 2008. SAMOS 2008. International Conference on
  • Conference_Location
    Samos
  • Print_ISBN
    978-1-4244-1985-2
  • Type

    conf

  • DOI
    10.1109/ICSAMOS.2008.4664864
  • Filename
    4664864