DocumentCode :
2866981
Title :
Latency Sensitive FMA Design
Author :
Galal, Sameh ; Horowitz, Mark
Author_Institution :
Stanford Univ., Stanford, CA, USA
fYear :
2011
fDate :
25-27 July 2011
Firstpage :
129
Lastpage :
138
Abstract :
The implementation of merged floating-point multiply-add operations can be optimized in many ways. For latency sensitive applications, our cascade design reduces the accumulation dependent latency by 2x over a fused design, at a cost of a 13% increase in non-accumulation dependent latency. A simple in-order execution model shows this design is superior in most applications, providing 12% average reduction in FP stalls, and improves performance by up to 6%. Simulations of superscalar out-of-order machines show 4% average improvement in CPI in 2-way machines and 4.6% in 4-way machines. The cascade design has the same area and energy budget as a traditional fused multiple-add FMA.
Keywords :
floating point arithmetic; logic design; cascade design; latency sensitive FMA design; merged floating-point multiply-add operations; superscalar out-of-order machines; Adders; Benchmark testing; Clocks; Graphics processing unit; Out of order; Parallel processing; Pipelines; Fused Multiply Add;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Arithmetic (ARITH), 2011 20th IEEE Symposium on
Conference_Location :
Tubingen
ISSN :
1063-6889
Print_ISBN :
978-1-4244-9457-6
Type :
conf
DOI :
10.1109/ARITH.2011.26
Filename :
5992118
Link To Document :
بازگشت