مرکز منطقه ای اطلاع رساني علوم و فناوري

DocumentCode :

1926546

Title :

A mixed-precision fused multiply and add

Author :

Brunie, Nicolas ; De Dinechin, Florent ; De Dinechin, Benoit

Author_Institution :

Kalray, USA

fYear :

2011

fDate :

6-9 Nov. 2011

Firstpage :

165

Lastpage :

169

Abstract :

The floating-point fused multiply and add, computing R=AB+C with a single rounding, is now an IEEE-754 standard operator. This article investigates variants in which the addend C and the result R are of a larger format, for instance binary64 (double precision), while the multiplier inputs A and B are of a smaller format, for instance binary32 (single precision). Like the standard FMA operator, the proposed mixed-precision operator computes AB+C with a single rounding, and fully support subnormals. With minor modifications, it is also able to perform the standard FMA in the smaller format, and the standard addition in the larger format. For sum-of-product applications, the proposed mixed-precision FMA provides the accumulation accuracy of the larger format at a cost that is shown to be only one third more than that of a classical FMA in the smaller format. Besides, we show that such a mixed-precision FMA, although not mentioned in existing standard (IEEE 754, C and Fortran), is perfectly compliant to these standards. For DSP and embedded applications, a mixed binary32/binary64 FMA will enable binary64 computing where it is most needed, at a small cost overhead with respect to current binary32 FMAs, and with fewer data transfers, hence lower power than a pure binary64 approach. In high-end processors, a mixed binary64/binary128 FMA could provide an adequate solution to the binary128 requirements of very large scale computing applications.

Keywords :

IEEE standards; adders; digital signal processing chips; embedded systems; floating point arithmetic; multiplying circuits; DSP; IEEE-754 standard operator; accumulation accuracy; binary128 requirements; data transfers; embedded applications; floating point fused multiply-add; high-end processors; mixed binary32-binary64 FMA; mixed-precision fused multiply-add; mixed-precision operator; standard FMA operator; sum-of-product applications; very large scale computing applications; Accuracy; Architecture; Computer architecture; Context; Digital signal processing; Optimization; Program processors; Floating-point; dot product; fused multiply-add; mixed precision;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Signals, Systems and Computers (ASILOMAR), 2011 Conference Record of the Forty Fifth Asilomar Conference on

Conference_Location :

Pacific Grove, CA

ISSN :

1058-6393

Print_ISBN :

978-1-4673-0321-7

Type :

conf

DOI :

10.1109/ACSSC.2011.6189977

Filename :

6189977

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1926546