• DocumentCode
    3596105
  • Title

    Effective sampling-driven performance tools for GPU-accelerated supercomputers

  • Author

    Chabbi, Milind ; Murthy, K. ; Fagan, Michael ; Mellor-Crummey, John

  • Author_Institution
    Dept. of Comput. Sci., Rice Univ. Houston, Houston, TX, USA
  • fYear
    2013
  • Firstpage
    1
  • Lastpage
    12
  • Abstract
    Performance analysis of GPU-accelerated systems requires a system-wide view that considers both CPU and GPU components. In this paper, we describe how to extend system-wide, sampling-based performance analysis methods to GPU-accelerated systems. Since current GPUs do not support sampling, our implementation required careful coordination of instrumentation-based performance data collection on GPUs with sampling-based methods employed on CPUs. In addition, we also introduce a novel technique for analyzing systemic idleness in CPU/GPU systems. We demonstrate the effectiveness of our techniques with application case studies on Titan and Keeneland. Some of the highlights of our case studies are: 1) we improved performance for LULESH 1.0 by 30%, 2) we identified a hardware performance problem on Keeneland, 3) we identified a scaling problem in LAMMPS derived from CUDA initialization, and 4) we identified a performance problem that is caused by GPU synchronization operations that suffer delays due to blocking system calls.
  • Keywords
    graphics processing units; parallel architectures; parallel machines; performance evaluation; sampling methods; synchronisation; CPU-GPU systems; CUDA initialization; GPU synchronization operations; GPU-accelerated supercomputers; GPU-accelerated systems; Keeneland; LAMMPS; LULESH 1.0; Titan; blocking system calls; hardware performance problem; instrumentation-based performance data collection; sampling-driven performance tools; system-wide sampling-based performance analysis methods; Context; Graphics processing units; Instruments; Kernel; Measurement; Radiation detectors; Tuning; CPU-GPU blame shifting; Call path profiling; Heterogeneous architectures; Performance analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing, Networking, Storage and Analysis (SC), 2013 International Conference for
  • Print_ISBN
    978-1-4503-2378-9
  • Type

    conf

  • DOI
    10.1145/2503210.2503299
  • Filename
    6877476