• DocumentCode
    3575173
  • Title

    A Compiler Translate Directive-Based Language to Optimized CUDA

  • Author

    Feng Li ; Hong An ; Weihao Liang ; Xiaoqiang Li ; Yichao Cheng ; Xia Jiang

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Univ. of Sci. & Technol. of China, Hefei, China
  • fYear
    2014
  • Firstpage
    982
  • Lastpage
    989
  • Abstract
    Graphics processing units(GPUs) provide a low cost platform for accelerating high performance computations. New programming languages, such as CUDA and OpenCL, make GPU programming attractive to programmers. However, programming GPUs is still a cumbersome task for two reasons, tedious performance optimizations and lack of portability. First, optimizing an algorithm for a specific GPU is a time-consuming task that requires a thorough understanding of both the algorithm and the underlying hardware. Unoptimized CUDA programs typically only achieve a small fraction of the peak GPU performance. Second, CUDA programs lack performance portability between different GPUs. Moving code from one GPU to another while maintaining the desired performance is a non-trivial task which often requires significant time. In this paper, we propose an optimized compiler that compiles a representative high level directive-based language to CUDA, which is capable of performing a wide variety of optimizations to generate efficient code for GPUs. We alleviate the portability problem of current GPU programming methods by using a high level directive-based language that provides a unified abstraction for currently popular CPU-GPU heterogeneous systems. Various optimizations, mainly the memory system optimizations, are automatically applied by our compiler to produce optimized CUDA code for GPU. Experiments on rodinia benchmark with different input sizes shows that our compiler achieves 70%, 75%, 84% performance of hand-written code on average respectively.
  • Keywords
    graphics processing units; optimising compilers; parallel architectures; parallel languages; parallel programming; CUDA code; CUDA programs; GPU performance; GPU programming; GPU programming methods; OpenCL; graphics processing units; hand-written code; high level directive-based language; high performance computations; memory system optimizations; optimized compiler; performance portability problem; programming languages; rodinia benchmark; Arrays; Graphics processing units; Hardware; Instruction sets; Kernel; Optimization; Parallel processing; Compiler; Directive-based language; GPU; Performance Optimization; Portability;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS), 2014 IEEE Intl Conf on
  • Print_ISBN
    978-1-4799-6122-1
  • Type

    conf

  • DOI
    10.1109/HPCC.2014.162
  • Filename
    7056864