Title :
The Ultrascalar processor-an asymptotically scalable superscalar microarchitecture
Author :
Henry, Dana S. ; Kuszmaul, Bradley C. ; Viswanath, Vinod
Author_Institution :
Depts. of Comput. Sci. & Electr. Eng., Yale Univ., New Haven, CT, USA
Abstract :
The poor scalability of existing superscalar processors has been of great concern to the computer engineering community. In particular the critical-path lengths of many components in existing implementations grow as Θ(n2) where n is the fetch width, the issue width, or the window size. This paper presents a novel implementation, called the Ultrascalar processor, that dramatically reduces the asymptotic critical-path length of a superscalar processor. The processor is implemented by a large collection of ALUs with controllers (together called execution stations) connected together by a network of parallel-prefix tree circuits. A fat-tree network connects an interleaved cache to the execution stations. These networks provide the full functionality of superscalar processors including renaming, out-of-order execution, and speculative execution. The Ultrascalar´s critical-path length due to gate delays is τgates=Θ(log n). The wire delays and chip size depend on the provided memory bandwidth and the layout. If the provided memory bandwidth is M(n) memory operations per clock cycle then, using an H-tree VLSI layout, the critical-path length due to wire delay (speed-of-light delay) is τwires={Θ(n1/2) if M(n) is O(n1/2-ε) for ε>0, [optimal]; {Θ(n1/2log n) if M(n) is Θ(n1/2), [near optimal]; and {Θ(M(n)) if M(n) is Ω(n1/2+ε ) for ε>0, [optimal] (with M suitably constrained.) The area is the square of the wire delay
Keywords :
CMOS digital integrated circuits; VLSI; delay estimation; integrated circuit layout; microprocessor chips; parallel architectures; ALUs; H-tree VLSI layout; Ultrascalar processor; asymptotic critical-path length reduction; asymptotically scalable superscalar microarchitecture; controllers; execution stations; fat-tree network; gate delays; interleaved cache; memory bandwidth; out-of-order execution; parallel-prefix tree circuits; renaming; speculative execution; wire delay; Bandwidth; Circuits; Clocks; Computer science; Delay; Delay lines; Design optimization; Microarchitecture; Out of order; Parallel processing; Registers; Scalability; Very large scale integration; Wire;
Conference_Titel :
Advanced Research in VLSI, 1999. Proceedings. 20th Anniversary Conference on
Conference_Location :
Atlanta, GA
Print_ISBN :
0-7695-0056-0
DOI :
10.1109/ARVLSI.1999.756053