Title :
Optimizing correctly-rounded reciprocal square roots for embedded VLIW cores
Author :
Jeannerod, Claude-Pierre ; Revy, Guillaume
Author_Institution :
INRIA Grenoble, Ecole Normale Super. de Lyon, Lyon, France
Abstract :
This paper presents an optimized software implementation of the reciprocal square root function x ¿ x-1/2, for IEEE binary32 floating-point data and with correct rounding to nearest. The main feature of this implementation is high instruction-level parallelism (ILP) exposure, which results here from an extension of the bivariate polynomial evaluation-based method of as well as from the design of a specific rounding procedure. This implementation proves to be very efficient for some VLIW processor cores like STMicroelectronics´ ST231 (used mainly for embedded media processing), for which a low latency of 29 cycles has been measured.
Keywords :
microprocessor chips; multiprocessing systems; optimisation; parallel machines; IEEE binary32 floating point data; STMicroelectronics ST231; VLIW cores; correctly rounded reciprocal square roots optimization; embedded media processing; instruction level parallelism; Counting circuits; Delay; Digital signal processing; Embedded software; Floating-point arithmetic; Parallel processing; Polynomials; Registers; Scientific computing; VLIW; VLIW processor core; binary floating-point arithmetic; correct rounding (to nearest); polynomial evaluation; reciprocal square root; software implementation;
Conference_Titel :
Signals, Systems and Computers, 2009 Conference Record of the Forty-Third Asilomar Conference on
Conference_Location :
Pacific Grove, CA
Print_ISBN :
978-1-4244-5825-7
DOI :
10.1109/ACSSC.2009.5469948