Operand-Load-Based Split Pipeline Architecture for High Clock Rate and Commensurable IPC

Author

Sangireddy, Rama ; Shah, Jatan

Author_Institution

Univ. of Texas at Dallas, Richardson

Volume

19

Issue

4

fYear

2008

fDate

4/1/2008 12:00:00 AM

Firstpage

529

Lastpage

544

Abstract

The increase in the complexity of a wide-issue processor with its pipeline width is one of the primary concerns of processor designers. In the conventional design, the hardware in the processor core is laid out to handle multiple instructions with two source operands in each pipeline stage. However, an analysis of SPEC2000 programs reveals that an integer program on the average constitutes 25.2 percent of two-op (both source registers) integer instructions and 72.5 percent of one-op/zero-op integer instructions. Floating-point (FP) programs are found to constitute on the average 15.8 percent of two-op integer instructions and 44.1 percent of one-op/zero-op integer instructions. The analysis observes that the hardware laid out for worst-case requirements in the integer pipeline is highly underutilized for a significant portion of time. To alleviate the complexity issues, we propose the split pipeline architecture, a novel technique to distinguish and process instructions based on their source operand requirements. The conventional pipeline is split into two after the decode stage, and the two pipelines are again converged at the execution stage. This leads to a capability of processing instructions at a higher clock rate and at almost the same instruction-per-cycle (IPC) throughput, as compared to a conventional processor. Various flavors of the proposed architecture are simulated and analyzed in this paper, with a circuit level analysis to determine the impact on the critical path delays. Results show that a processor that can fetch, decode, and commit eight instructions in each cycle and with split pipelines of two two-source integer instruction and six zero/one-source integer instruction can achieve a clock rate that is 15.8 percent faster than an eight-wide conventional processor while reducing the IPC throughput by only 0.7 percent for SPEC2000 benchmarks. Similarly, a four-wide processor with split pipelines of one two-source integer instruction and three zero/- - one-source integer instructions can achieve a clock rate that is 19.69 percent faster than a four-wide conventional processor while reducing the IPC throughput by only 1.9 percent.

Keywords

multiprocessing programs; multiprocessing systems; pipeline processing; circuit level analysis; critical path delay; floating-point program; high clock rate; instruction-per-cycle throughput; integer instruction; operand-load-based split pipeline architecture; pipeline width; processor core hardware; processor design; source operand; source register; wide-issue processor;

fLanguage

English

Journal_Title

Parallel and Distributed Systems, IEEE Transactions on

Publisher

ieee

ISSN

1045-9219

Type

jour

DOI

10.1109/TPDS.2007.70742

Filename

4359960