• DocumentCode
    2958559
  • Title

    Generating Device-specific GPU Code for Local Operators in Medical Imaging

  • Author

    Membarth, Richard ; Hannig, Frank ; Teich, Jürgen ; Körner, Mario ; Eckert, Wieland

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Erlangen-Nuremberg, Erlangen, Germany
  • fYear
    2012
  • fDate
    21-25 May 2012
  • Firstpage
    569
  • Lastpage
    581
  • Abstract
    To cope with the complexity of programming GPU accelerators for medical imaging computations, we developed a framework to describe image processing kernels in a domain-specific language, which is embedded into C++. The description uses decoupled access/execute metadata, which allow the programmer to specify both execution constraints and memory access patterns of kernels. A source-to-source compiler translates this high-level description into low-level CUDA and Open CL code with automatic support for boundary handling and filter masks. Taking the annotated metadata and the characteristics of the parallel GPU execution model into account, two-layered parallel implementations - utilizing SPMD and MPMD parallelism - are generated. An abstract hardware model of graphics card architectures allows to model GPUs of multiple vendors like AMD and NVIDIA, and to generate device-specific code for multiple targets. It is shown that the generated code is faster than manual implementations and those relying on hardware support for boundary handling. Implementations from Rapid Mind, a commercial framework for GPU programming, are outperformed and similar results achieved compared to the GPU backend of the widely used image processing library Open CV.
  • Keywords
    C++ language; graphics processing units; medical image processing; parallel architectures; AMD; C++; GPU accelerator; GPU programming; MPMD parallelism; NVIDIA; Open CL code; Rapid Mind; SPMD parallelism; abstract hardware model; boundary handling; decoupled access-execute metadata; device-specific GPU code; domain-specific language; execution constraint; filter mask; graphics card architecture; image processing kernel; local operator; low-level CUDA; medical imaging; memory access pattern; parallel GPU execution model; source-to-source compiler; Biomedical imaging; Graphics processing unit; Hardware; Image processing; Kernel; Programming; CUDA; GPU; OpenCL; code generation; domain-specific language; local operators; medical imaging;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel & Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International
  • Conference_Location
    Shanghai
  • ISSN
    1530-2075
  • Print_ISBN
    978-1-4673-0975-2
  • Type

    conf

  • DOI
    10.1109/IPDPS.2012.59
  • Filename
    6267859