• DocumentCode
    2524623
  • Title

    Multi-kernel Auto-Tuning on GPUs: Performance and Energy-Aware Optimization

  • Author

    Guerreiro, Joao ; Ilic, Aleksandar ; Roma, Nuno ; Tomas, Pedro

  • Author_Institution
    Inst. Super. Tecnico, Univ. de Lisboa, Lisbon, Portugal
  • fYear
    2015
  • fDate
    4-6 March 2015
  • Firstpage
    438
  • Lastpage
    445
  • Abstract
    Prompted by their very high computational capabilities and memory bandwidth, Graphics Processing Units (GPUs) are already widely used to accelerate the execution of many scientific applications. However, programmers are still required to have a very detailed knowledge of the GPU internal architecture when tuning the kernels, in order to improve either performance or energy-efficiency. Moreover, different GPU devices have different characteristics, moving a kernel to a different GPU typically requires re-tuning the kernel execution, in order to efficiently exploit the underlying hardware. The procedure proposed in this work is based on real-time kernel profiling and GPU monitoring and it automatically tunes parameters from several concurrent kernels to maximize the performance or minimize the energy consumption. Experimental results on NVIDIA GPU devices with up to 4 concurrent kernels show that the proposed solution achieves near optimal configurations. Furthermore, significant energy savings can be achieved by using the proposed energy-efficiency auto-tuning procedure.
  • Keywords
    graphics processing units; performance evaluation; power aware computing; GPU devices; GPU internal architecture; GPU monitoring; NVIDIA GPU devices; energy aware optimization; energy efficiency autotuning procedure; graphics processing units; memory bandwidth; multikernel auto tuning; performance optimization; real-time kernel profiling; scientific applications; Energy consumption; Frequency measurement; Graphics processing units; Instruction sets; Kernel; Optimization; Performance evaluation; CUDA; GPGPU; GPU; OpenCL; auto-tuning; energy-awareness; multikernel;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel, Distributed and Network-Based Processing (PDP), 2015 23rd Euromicro International Conference on
  • Conference_Location
    Turku
  • ISSN
    1066-6192
  • Type

    conf

  • DOI
    10.1109/PDP.2015.44
  • Filename
    7092758