DocumentCode :
952828
Title :
Operand-Load-Based Split Pipeline Architecture for High Clock Rate and Commensurable IPC
Author :
Sangireddy, Rama ; Shah, Jatan
Author_Institution :
Univ. of Texas at Dallas, Richardson
Volume :
19
Issue :
4
fYear :
2008
fDate :
4/1/2008 12:00:00 AM
Firstpage :
529
Lastpage :
544
Abstract :
The increase in the complexity of a wide-issue processor with its pipeline width is one of the primary concerns of processor designers. In the conventional design, the hardware in the processor core is laid out to handle multiple instructions with two source operands in each pipeline stage. However, an analysis of SPEC2000 programs reveals that an integer program on the average constitutes 25.2 percent of two-op (both source registers) integer instructions and 72.5 percent of one-op/zero-op integer instructions. Floating-point (FP) programs are found to constitute on the average 15.8 percent of two-op integer instructions and 44.1 percent of one-op/zero-op integer instructions. The analysis observes that the hardware laid out for worst-case requirements in the integer pipeline is highly underutilized for a significant portion of time. To alleviate the complexity issues, we propose the split pipeline architecture, a novel technique to distinguish and process instructions based on their source operand requirements. The conventional pipeline is split into two after the decode stage, and the two pipelines are again converged at the execution stage. This leads to a capability of processing instructions at a higher clock rate and at almost the same instruction-per-cycle (IPC) throughput, as compared to a conventional processor. Various flavors of the proposed architecture are simulated and analyzed in this paper, with a circuit level analysis to determine the impact on the critical path delays. Results show that a processor that can fetch, decode, and commit eight instructions in each cycle and with split pipelines of two two-source integer instruction and six zero/one-source integer instruction can achieve a clock rate that is 15.8 percent faster than an eight-wide conventional processor while reducing the IPC throughput by only 0.7 percent for SPEC2000 benchmarks. Similarly, a four-wide processor with split pipelines of one two-source integer instruction and three zero/- - one-source integer instructions can achieve a clock rate that is 19.69 percent faster than a four-wide conventional processor while reducing the IPC throughput by only 1.9 percent.
Keywords :
multiprocessing programs; multiprocessing systems; pipeline processing; circuit level analysis; critical path delay; floating-point program; high clock rate; instruction-per-cycle throughput; integer instruction; operand-load-based split pipeline architecture; pipeline width; processor core hardware; processor design; source operand; source register; wide-issue processor;
fLanguage :
English
Journal_Title :
Parallel and Distributed Systems, IEEE Transactions on
Publisher :
ieee
ISSN :
1045-9219
Type :
jour
DOI :
10.1109/TPDS.2007.70742
Filename :
4359960
Link To Document :
بازگشت