Title :
Phase-Based Profiling in GPGPU Kernels
Author :
Dietrich, Robert ; Schmitt, Felix ; Widera, René ; Bussmann, Michael
Author_Institution :
Center for Inf. Services & High Performance Comput. (ZIH), Tech. Univ. Dresden, Dresden, Germany
Abstract :
More and more computationally intensive scientific applications make use of hardware accelerators like general purpose graphics processing units (GPGPUs). Compared to software development for typical multi-core processors their programming is fairly complex and needs hardware specific optimizations to utilize the full computing power. To achieve high performance, critical parts of a program have to be identified and optimized. This paper proposes an approach for performance analysis of CUDA kernel source code regions, which for the first time allows measuring the execution times within GPGPU kernels. We developed a tool, which implements the presented method and supports the application developer to easily identify hot spots within the kernel. The presented tool uses compile time code analysis to automatically instrument suitable instrumentation points for minimal program perturbation and further provides support for manual instrumentation. To the best of our knowledge this is the first approach, which allows for scalable runtime analysis within GPGPU kernels. Combined with existing performance analysis techniques this facilitates obtaining the full potential of modern parallel systems.
Keywords :
graphics processing units; multiprocessing systems; parallel architectures; program diagnostics; software metrics; CUDA; CUDA kernel source code regions; GPGPU kernels; compile time code analysis; execution time measurement; general purpose graphics processing units; hardware accelerators; instrumentation points; minimal program perturbation; multicore processors; parallel systems; performance analysis techniques; phase-based profiling; program identification; program optimization; runtime analysis; scientific applications; software development; Graphics processing unit; Hardware; Instruction sets; Instruments; Kernel; Radiation detectors; Runtime; CUDA; GPGPU; accelerators; many-core; performance analysis; profiling; tracing;
Conference_Titel :
Parallel Processing Workshops (ICPPW), 2012 41st International Conference on
Conference_Location :
Pittsburgh, PA
Print_ISBN :
978-1-4673-2509-7
DOI :
10.1109/ICPPW.2012.59