• DocumentCode
    1786838
  • Title

    An optimal microarchitecture for stencil computation acceleration based on non-uniform partitioning of data reuse buffers

  • Author

    Cong, J. ; Peng Li ; Bingjun Xiao ; Peng Zhang

  • Author_Institution
    Comput. Sci. Dept. & Electr. Eng. Dept., Univ. of California, Los Angeles, Los Angeles, CA, USA
  • fYear
    2014
  • fDate
    1-5 June 2014
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    High-level synthesis (HLS) tools have made significant progress in compiling high-level descriptions of computation into highly pipelined register-transfer level (RTL) specifications. The high-throughput computation raises a high data demand. To prevent data accesses from being the bottleneck, on-chip memories are used as data reuse buffers to reduce off-chip accesses. Also memory partitioning is explored to increase the memory bandwidth by scheduling multiple simultaneous memory accesses to different memory banks. Prior work on memory partitioning of data reuse buffers is limited to uniform partitioning. In this paper, we perform an early-stage exploration of non-uniform memory partitioning. We use the stencil computation, a popular communication-intensive application domain, as a case study to show the potential benefits of non-uniform memory partitioning. Our novel method can always achieve the minimum memory size and the minimum number of memory banks, which cannot be guaranteed in any prior work. We develop a generalized microarchitecture to decouple stencil accesses from computation, and an automated design flow to integrate our microarchitecture with the HLS-generated computation kernel for a complete accelerator.
  • Keywords
    high level synthesis; integrated memory circuits; logic partitioning; HLS tools; HLS-generated computation kernel; communication-intensive application domain; data access; data reuse buffers; high-level descriptions; high-level synthesis tools; highly pipelined register-transfer level specifications; memory access; memory bandwidth; memory banks; nonuniform memory partitioning; off-chip access; on-chip memories; optimal microarchitecture; stencil computation acceleration; Arrays; Clocks; Kernel; Microarchitecture; Ports (Computers); Radiation detectors; System-on-chip;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Design Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE
  • Conference_Location
    San Francisco, CA
  • Type

    conf

  • Filename
    6881404