• DocumentCode
    1484652
  • Title

    Low-Cost Binary128 Floating-Point FMA Unit Design with SIMD Support

  • Author

    Huang, Libo ; Ma, Sheng ; Shen, Li ; Wang, Zhiying ; Xiao, Nong

  • Author_Institution
    Sch. of Comput., Nat. Univ. of Defense Technol., Changsha, China
  • Volume
    61
  • Issue
    5
  • fYear
    2012
  • fDate
    5/1/2012 12:00:00 AM
  • Firstpage
    745
  • Lastpage
    751
  • Abstract
    Binary64 arithmetic is rapidly becoming inadequate to cope with today´s large-scale computations due to an accumulation of errors. Therefore, binary128 arithmetic is now required to increase the accuracy and reliability of these computations. At the same time, an obvious trend emerging in modern processors is to extend their instruction sets by allowing single instruction multiple data (SIMD) execution, which can significantly accelerate the data-parallel applications. To address the combined demands mentioned above, this paper presents the architecture of a low-cost binary128 floating-point fused multiply add (FMA) unit with SIMD support. The proposed FMA design can execute a binary128 FMA every other cycle with a latency of four cycles, or two binary64 FMAs fully pipelined with a latency of three cycles, or four binary32 FMAs fully pipelined with a latency of three cycles. We use two binary64 FMA units to support binary128 FMA which requires much less hardware than a fully pipelined binary128 FMA. The presented binary128 FMA design uses both segmentation and iteration hardware vectorization methods to trade off performance, such as throughput and latency, against area and power. Compared with a standard binary128 FMA implementation, the proposed FMA design has 30 percent less area and 29 percent less dynamic power dissipation.
  • Keywords
    parallel processing; pipeline arithmetic; SIMD support; binary32 FMAs; binary64 arithmetic; data parallel applications; dynamic power dissipation; iteration hardware; low cost binary128 floating point FMA; segmentation hardware; single instruction multiple data execution; unit design; vectorization methods; Adders; Compounds; Computer architecture; Hardware; Multiplexing; Pipelines; Program processors; Floating point; SIMD; binary128; computer arithmetic.; fused multiply add; implementation;
  • fLanguage
    English
  • Journal_Title
    Computers, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9340
  • Type

    jour

  • DOI
    10.1109/TC.2011.77
  • Filename
    5740858