DocumentCode :
1514477
Title :
Energy-Efficient Floating-Point Unit Design
Author :
Galal, Sameh ; Horowitz, Mark
Author_Institution :
Dept. of Electr. Eng., Stanford Univ., Stanford, CA, USA
Volume :
60
Issue :
7
fYear :
2011
fDate :
7/1/2011 12:00:00 AM
Firstpage :
913
Lastpage :
922
Abstract :
Energy-efficient computation is critical if we are going to continue to scale performance in power-limited systems. For floating-point applications that have large amounts of data parallelism, one should optimize the throughput/mm2 given a power density constraint. We present a method for creating a trade-off curve that can be used to estimate the maximum floating-point performance given a set of area and power constraints. Looking at FP multiply-add units and ignoring register and memory overheads, we find that in a 90 nm CMOS technology at 1 W/mm2, one can achieve a performance of 27 GFlops/mm2 single precision, and 7.5 GFlops/mm double precision. Adding register file overheads reduces the throughput by less than 50 percent if the compute intensity is high. Since the energy of the basic gates is no longer scaling rapidly, to maintain constant power density with scaling requires moving the overall FP architecture to a lower energy/performance point. A 1 W/mm2 design at 90 nm is a "high-energy" design, so scaling it to a lower energy design in 45 nm still yields a 7× performance gain, while a more balanced 0.1 W/mm2 design only speeds up by 3.5× when scaled to 45 nm. Performance scaling below 45 nm rapidly decreases, with a projected improvement of only ~3x for both power densities when scaling to a 22 nm technology.
Keywords :
CMOS logic circuits; adders; floating point arithmetic; logic design; low-power electronics; multiplying circuits; nanoelectronics; power aware computing; CMOS technology; FP architecture; FP multiply-add units; area constraint; data parallelism; energy-efficient computation; energy-efficient floating-point unit design; floating-point application; floating-point performance; high-energy design; logic structure; power density constraint; power-limited system; register file overhead; trade-off curve; Computer architecture; Energy efficiency; Optimization; Pipeline processing; Registers; Threshold voltage; Throughput; Arithmetic and logic structures; floating point; fused multiply-add; high-speed arithmetic; throughput/{rm mm}^{2} optimization.;
fLanguage :
English
Journal_Title :
Computers, IEEE Transactions on
Publisher :
ieee
ISSN :
0018-9340
Type :
jour
DOI :
10.1109/TC.2010.121
Filename :
5483287
Link To Document :
بازگشت