• DocumentCode
    2073950
  • Title

    High-level synthesis of multiple dependent CUDA kernels on FPGA

  • Author

    Gurumani, Swathi T. ; Cholakkal, H. ; Yun Liang ; Rupnow, Kyle ; Deming Chen

  • Author_Institution
    Adv. Digital Sci. Center, Singapore, Singapore
  • fYear
    2013
  • fDate
    22-25 Jan. 2013
  • Firstpage
    305
  • Lastpage
    312
  • Abstract
    High-level synthesis (HLS) tools provide automatic generation of hardware at the register transfer level (RTL) from algorithm descriptions written in high-level languages, enabling faster creation of custom accelerators for FPGA architectures. Existing HLS tools support a wide variety of input languages, and assist users in design space exploration through automation and feedback on designs´ performance bottlenecks. This design space exploration applies techniques such as pipelining, partitioning and resource sharing in order to improve performance, and resource utilization. However, although automated exploration can find some inherent parallelism, data-parallel input source code is still superior for exposing a greater variety of parallelism. In prior work, we demonstrated automated design space exploration of GPU multi-threaded (CUDA) language source code for efficient RTL generation. In this paper, we examine the challenges in extending this automated design space exploration to multiple dependent CUDA kernels, demonstrate a step-by-step procedure for efficiently performing multi-kernel synthesis, and demonstrate the potential of this approach through a case study of a stereo matching algorithm. This study demonstrates that HLS of multiple dependent CUDA kernels can maintain performance parity with the GPU implementation, while consuming over 16X less energy than the GPU. Based on our manual procedure, we identify the key challenges in fully automating the synthesis of multi-kernel CUDA programs.
  • Keywords
    electronic design automation; field programmable gate arrays; network synthesis; CUDA kernels; FPGA; GPU multi-threaded language source code; RTL generation; automated design space exploration; computed unified device architecture; custom accelerators; data parallel input source code; high-level synthesis; multikernel CUDA programs; register transfer level; resource sharing; resource utilization; stereo matching algorithm; Analytical models; Computational modeling; Field programmable gate arrays; Graphics processing units; Kernel; Parallel processing; Space exploration;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Design Automation Conference (ASP-DAC), 2013 18th Asia and South Pacific
  • Conference_Location
    Yokohama
  • ISSN
    2153-6961
  • Print_ISBN
    978-1-4673-3029-9
  • Type

    conf

  • DOI
    10.1109/ASPDAC.2013.6509613
  • Filename
    6509613