Author :
Hayenga, Mitchell ; Naresh, Vignyan Reddy Kothinti ; Lipasti, Mikko H.
Abstract :
With the rise of mobile and cloud-based computing, modern processor design has become the task of achieving maximum power efficiency at specific performance targets. This trend, coupled with dwindling improvements in single-threaded performance, has led architects to predominately focus on energy efficiency. In this paper we note that for the majority of benchmarks, a substantial portion of execution time is spent executing simple loops. Capitalizing on the frequency of loops, we design an out-of-order processor architecture that achieves an aggressive level of performance while minimizing the energy consumed during the execution of loops. The Revolver architecture achieves energy efficiency during loop execution by enabling “in-place execution” of loops within the processor´s out-of-order backend. Essentially, a few static instances of each loop instruction are dispatched to the out-of-order execution core by the processor frontend. The static instruction instances may each be executed multiple times in order to complete all necessary loop iterations. During loop execution the processor frontend, including instruction fetch, branch prediction, decode, allocation, and dispatch logic, can be completely clock gated. Additionally we propose a mechanism to preexecute future loop iteration load instructions, thereby realizing parallelism beyond the loop iterations currently executing within the processor core. Employing Revolver across three benchmark suites, we eliminate 20, 55, and 84% of all frontend instruction dispatches. Overall, we find Revolver maintains performance, while resulting in 5.3%-18.3% energy-delay benefit over loop buffers or micro-op cache techniques alone.
Keywords :
computer architecture; energy conservation; instruction sets; power aware computing; Revolver architecture; branch prediction; dispatch logic; energy efficiency; frontend instruction dispatches; instruction fetch; loop buffers; loop execution; loop iterations; micro-op cache techniques; out-of-order execution core; out-of-order processor architecture; power efficient loop execution; processor core; processor frontend; static instruction instances; Arrays; Clocks; Out of order; Pipelines; Rain; Registers; Resource management;