A centralized cache miss driven technique to improve processor power dissipation

Author

Homayoun, Houman ; Makhzan, Mohammad ; Gaudiot, Jean-Luc ; Veidenbaum, Alex

Author_Institution

Dept. of Electr. & Comput. Eng., UC, Irvine, CA

fYear

2008

fDate

21-24 July 2008

Firstpage

195

Lastpage

202

Abstract

Leakage and dynamic power are a major challenge in microprocessor design. Many circuit techniques along with micro-architectural innovations have been proposed to reduce power in individual processor units. But it is not clear that these techniques can be combined. A centralized approach which can reduce power in more than one unit at a time with minimal the hardware overhead is needed. This paper proposes such a centralized approach that attempts to simultaneously reduce power in processor units with highest dissipation: the reorder buffer, the instruction queue, and the integer and the floating-point register files. It is based on an observation that utilization for the aforementioned units varies significantly, during a period when an L2 cache miss or multiple L1 cache misses are pending as compared to a period when none of these are present. Therefore we propose to dynamically adjust the size and thus power dissipation of these resources during such periods. Circuit level modifications required for such resource adaptation are presented. Simulation results for SPEC2K benchmarks show a substantial reduction in both leakage and dynamic power: the total dynamic power is reduced by as much as 30, 31, 31 and 48% for the reorder buffer, the integer register file, the floating-point register file and the instruction queue, respectively. The reduction in leakage is up to 33% for reorder buffer and 37% for integer and floating-point register files. The total energy-delay product is reduced, on average, by 15, 26, 20 and 17% for the reorder buffer, the integer register file, the floating-point register file and the instruction queue respectively. This comes at the cost of a performance impact which is as low as 0.9% for integer and 2.2% for floating-point benchmarks. The required hardware modification is shown to be minimal.

Keywords

cache storage; floating point arithmetic; microprocessor chips; SPEC2K benchmarks; centralized approach; centralized cache miss driven technique; circuit level modifications; floating-point register files; instruction queue; microprocessor design; processor power dissipation; reorder buffer; resource adaptation; Circuits; Computer science; Hardware; Monitoring; Multicore processing; Performance loss; Power dissipation; Registers; Resource management; Voltage;

fLanguage

English

Publisher

ieee

Conference_Titel

Embedded Computer Systems: Architectures, Modeling, and Simulation, 2008. SAMOS 2008. International Conference on

Conference_Location

Samos

Print_ISBN

978-1-4244-1985-2

Type

conf

DOI

10.1109/ICSAMOS.2008.4664864

Filename

4664864