• DocumentCode
    2786450
  • Title

    Floating-Point Accumulation Circuit for Matrix Applications

  • Author

    Bodnar, M.R. ; Humphrey, J.R. ; Curt, Petersen F. ; Prather, Dennis W.

  • Author_Institution
    Delaware Univ., Newark, DE
  • fYear
    2006
  • fDate
    24-26 April 2006
  • Firstpage
    303
  • Lastpage
    304
  • Abstract
    Many scientific algorithms require floating-point reduction operations, or accumulations, including matrix-vector-multiply (MVM), vector dot-products, and the discrete cosine transform (DCT). Because FPGA implementations of each of these algorithms are desirable, it is clear that a high-performance, floatingpoint accumulation unit is necessary. However, this type of circuit is difficult to design in an FPGA environment due to the deep pipelining of the floatingpoint arithmetic units, which is needed in order to attain high performance designs (Durbano et al., 2004, Leeser and Wang, 2004). A deep pipeline requires special handling in feedback circuits because of the long delay, which is further complicated by a continuous input data stream. Proposed accumulator architectures, which overcome such performance bottlenecks, are described in Zuo et al. (2005) and Zuo and Prassana (2005). This paper presents a floating-point accumulation circuit that is a natural evolution of this work. The system can handle streams of arbitrary length, requires modest area, and can handle interrupted data inputs. In contrast to the designs proposed by Zhuo et al., the proposed architecture maintains buffers for partial result storage which utilize significantly less embedded memory resources, while maintaining fixed size and speed characteristics, regardless of stream length. The results for both single- and double-precision accumulation architectures was verified in a Virtex-II 8000-4 part clocked at more than 150 MHz, and the power of this design was demonstrated in a computationally intense, matrix-matrix-multiply application
  • Keywords
    discrete cosine transforms; floating point arithmetic; logic circuits; matrix algebra; pipeline processing; Virtex-II 8000-4; accumulator architectures; floating point arithmetic; floating-point accumulation circuit; floating-point reduction; interrupted data; matrix applications; matrix-matrix-multiply; scientific algorithms; Arithmetic; Buffer storage; Clocks; Computer applications; Computer architecture; Delay; Discrete cosine transforms; Feedback circuits; Field programmable gate arrays; Pipeline processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Field-Programmable Custom Computing Machines, 2006. FCCM '06. 14th Annual IEEE Symposium on
  • Conference_Location
    Napa, CA
  • Print_ISBN
    0-7695-2661-6
  • Type

    conf

  • DOI
    10.1109/FCCM.2006.41
  • Filename
    4020931