Title :
The Floating-Point Unit of the Jaguar x86 Core
Author :
Rupley, J. ; King, Jacob ; Quinnell, E. ; Galloway, F. ; Patton, K. ; Seidel, P. ; Dinh, J. ; Hai Bui ; Bhowmik, Abhijit
Author_Institution :
AMD Austin & Bangalore, Bangalore, India
Abstract :
The AMD Jaguar x86 core uses a fully-synthesized, 128-bit native floating-point unit (FPU) built as a co-processor model. The Jaguar FPU supports several x86 ISA extensions, including x87, MMX, SSE1 through SSE4.2, AES, CLMUL, AVX, and F16C instruction sets. The front end of the unit decodes two complex operations per cycle and uses a dedicated renamer (RN), free list (FL), and retire queue (RQ) for in-order dispatch and retire. The FPU issues to the execution units with a dedicated out-of-order, dual-issue scheduler. Execution units source operands from a synthesized physical register file (PRF) and bypass network. The back end of the unit has two execution pipes: the first pipe contains a vector integer ALU, a vector integer MUL unit, and a floating-point adder (FPA), the second pipe contains a vector integer ALU, a store-convert unit, and a floating-point iterative multiplier (FPM). The implementation of the unit focused on low-power design and on vectorized single-precision (SP) performance optimizations. The verification of the unit required complex pseudo-random and formal verification techniques. The Jaguar FPU is built in a 28nm CMOS process.
Keywords :
coprocessors; floating point arithmetic; formal verification; instruction sets; scheduling; AES instruction set; AMD Jaguar x86 core; AVX instruction set; CLMUL instruction set; CMOS process; F16C instruction set; FPU; PRF; SSE4.2 instruction set; arithmetic and logic unit; bypass network; complimentary metal oxide semiconductor; coprocessor model; dedicated renamer; dual-issue scheduler; execution pipe; floating-point adder; floating-point iterative multiplier; formal verification technique; free list; low-power design; native floating-point unit; physical register file; pseudorandom verification technique; retire queue; size 28 nm; vector integer ALU; vector integer MUL unit; x86 ISA extension; Adders; Decoding; Microarchitecture; Optimization; Out of order; Registers; Vectors; AES; AMD Jaguar; AVX; CLMUL; F16C; MMX; SSE; floating-point unit; industry implementation; x87;
Conference_Titel :
Computer Arithmetic (ARITH), 2013 21st IEEE Symposium on
Conference_Location :
Austin, TX
Print_ISBN :
978-1-4673-5644-2
DOI :
10.1109/ARITH.2013.24