• DocumentCode
    3240374
  • Title

    Exploration of automatic optimization for CUDA programming

  • Author

    Al-Mouhamed, Mayez ; ul Hassan Khan, A.

  • Author_Institution
    King Fahd Univ. of Pet. & Miner., Dhahran, Saudi Arabia
  • fYear
    2012
  • fDate
    6-8 Dec. 2012
  • Firstpage
    55
  • Lastpage
    60
  • Abstract
    Graphic processing Units (GPUs) are gaining ground in high-performance computing. CUDA (an extension to C) is most widely used parallel programming framework for general purpose GPU computations. However, the task of writing optimized CUDA program is complex even for experts. We present a method for restructuring loops into an optimized CUDA kernels based on a 3-step algorithm which are loop tiling, coalesced memory access, and resource optimization. We also establish the relationships between the influencing parameters and propose a method for finding possible tiling solutions with coalesced memory access that best meets the identified constraints. We also present a simplified algorithm for restructuring loops and rewrite them as an efficient CUDA Kernel. The execution model of synthesized kernel consists of uniformly distributing the kernel threads to keep all cores busy while transferring a tailored data locality which is accessed using coalesced pattern to amortize the long latency of the secondary memory. In the evaluation, we implement some simple applications using the proposed restructuring strategy and evaluate the performance in terms of execution time and GPU throughput.
  • Keywords
    graphics processing units; optimisation; parallel architectures; parallel programming; -step algorithm; CUDA programming; GPU; automatic optimization; coalesced memory access; general purpose GPU computations; graphic processing units; high-performance computing; loop tiling; optimized CUDA kernels; parallel programming framework; restructuring loops; secondary memory; tailored data; Algorithm design and analysis; Graphics processing units; Prediction algorithms; Programming; CUDA; Compiler Transformations; GPGPU; GPU; Parallel Programming; directive-based language; source-to-source compiler;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Distributed and Grid Computing (PDGC), 2012 2nd IEEE International Conference on
  • Conference_Location
    Solan
  • Print_ISBN
    978-1-4673-2922-4
  • Type

    conf

  • DOI
    10.1109/PDGC.2012.6449791
  • Filename
    6449791