DocumentCode
3497481
Title
REEL: Reducing effective execution latency of floating point operations
Author
Reddy, Veerababu ; Gilani, Syed Zulqarnain ; Gunadi, Erika ; Nam Sung Kim ; Schulte, M.J. ; Lipasti, Mikko H.
Author_Institution
Univ. of Wisconsin-Madison, Madison, WI, USA
fYear
2013
fDate
4-6 Sept. 2013
Firstpage
187
Lastpage
192
Abstract
The height of the dynamic dependence graph of a program, as executed by a processor, determines the minimum bound on the execution time. This height can be decreased by reducing the effective execution latency of operations that form dependence chains in the graph. In this paper, we propose a technique called REEL to reduce overall latency of chains of dependent floating point (FP) operations by increasing the throughput of computation. REEL comprises of a high-throughput floating point unit (HFP) that allows early issue of an FP Add that is dependent on another FP Add or FP Multiply. This is complemented by instruction scheduler modifications that allow early issue of dependent FP Adds, and a novel checker logic that corrects any precision errors. Unlike conventional static operation fusion, like fused Multiply-Add (FMA), there are no changes to the instruction set to enable utilization of the new hardware, and no recompilation is necessary. Furthermore, unlike ISA-level FMA, our technique produces results that are bit compatible while boosting performance of Add-Add dependence pairs in addition to Multiply-Add pairs. Our evaluation of REEL using CFP2006 benchmarks shows an average performance gain of 7.6% and maximum performance gain of 17% while consuming 1.2% lower energy.
Keywords
benchmark testing; floating point arithmetic; formal logic; graph theory; instruction sets; scheduling; CFP2006 benchmarks; FP add; FP multiply; HFP; ISA-level FMA; REEL; add-add dependence pairs; checker logic; dependent FP operations; dependent floating point operations; dynamic dependence graph; execution latency; floating point operations; fused multiply-add; high-throughput floating point unit; instruction scheduler modifications; instruction set; multiply-add pairs; precision errors; static operation fusion; Adders; Benchmark testing; Performance gain; Pipelines; Ports (Computers); Registers; Throughput;
fLanguage
English
Publisher
ieee
Conference_Titel
Low Power Electronics and Design (ISLPED), 2013 IEEE International Symposium on
Conference_Location
Beijing
Print_ISBN
978-1-4799-1234-6
Type
conf
DOI
10.1109/ISLPED.2013.6629292
Filename
6629292
Link To Document