• DocumentCode
    144596
  • Title

    Dynamic memory optimization and parallelism management for OpenCL

  • Author

    Chao-Hung Hsu ; I-Wei Wu ; Shann, Jean Jyh-Jiun

  • Author_Institution
    Dept. of Comput. Sci., Nat. Chiao Tung Univ., Hsinchu, Taiwan
  • Volume
    2
  • fYear
    2014
  • fDate
    26-28 April 2014
  • Firstpage
    776
  • Lastpage
    780
  • Abstract
    Recently, multiprocessor platforms have become trends for achieving high performance. OpenCL (Open Computing Language) is one of the programming standards for heterogeneous multiprocessors, and provides portability for these platforms. Our research focuses on platforms with CPUs and GPUs since GPUs are now widespread in use. On such a platform, two programming issues may affect the performance on GPU computing significantly. One is the work load distribution and another is the employment of GPU memory hierarchy. To fully utilize the characteristics of GPUs, programmers have to be not only proficient at parallel programming but also familiar with hardware specifications. Therefore, in this paper, we propose a compilation pass to automatically perform optimizations for OpenCL kernels. Our compilation pass will transform an input naïve kernel function with optimizations, including kernel function analysis, work-group rearrangement, memory coalescing, and work-item merge. In addition, our framework is implemented on a runtime system so that it may dynamically adjust the optimizing parameters according to the hardware specifications. Considering the execution time, the optimized kernels generated by our design may have significant performance improvement over the naïve versions. Although the optimizations performed in runtime may incur time overheads, the overheads may be covered by intensive kernel computation or massive input data in most cases.
  • Keywords
    graphics processing units; multiprocessing systems; operating system kernels; optimising compilers; parallel programming; software performance evaluation; storage management; CPU; GPU memory hierarchy; Open Computing Language; OpenCL; OpenCL kernel optimization; compilation pass; dynamic memory optimization; dynamic optimizing parameter adjustment; hardware specifications; heterogeneous multiprocessors; kernel computation; kernel function analysis; memory coalescing; multiprocessor platforms; naive kernel function; parallel programming; parallelism management; performance improvement; portability; runtime system; work load distribution; work-group rearrangement; work-item merging; Graphics processing units; Kernel; Memory management; Optimization; Parallel processing; Random access memory; Registers; GPU; LLVM; OpenCL; dynamic optimization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Science, Electronics and Electrical Engineering (ISEEE), 2014 International Conference on
  • Conference_Location
    Sapporo
  • Print_ISBN
    978-1-4799-3196-5
  • Type

    conf

  • DOI
    10.1109/InfoSEEE.2014.6947772
  • Filename
    6947772