Title :
Perf-Sat: Runtime Detection of Performance Saturation for GPGPU Applications
Author :
Awatramani, Mihir ; Zambreno, Joseph ; Rover, Diane
Author_Institution :
Dept. of Electr. & Comput. Eng., Iowa State Univ., Ames, IA, USA
Abstract :
Graphic Processing Units (GPUs) achieve latency tolerance by exploiting massive amounts of thread level parallelism. Each core executes several hundred to a few thousand simultaneously active threads. The work scheduler tries to maximize the number of active threads on each core by launching threads until at least one of the required resources is completely utilized. The rationale is, more threads would give the thread scheduler more opportunities to hide memory latency and thus would result in better performance. In this work, we show that launching the maximum number of threads is not always necessary to achieve the best performance. Applications have an optimal thread count value at which the performance saturates. Increasing the number of threads beyond this value results in no better and sometimes worse performance. To this end, we develop Perf-Sat: a mechanism to detect the optimal number of threads required on each core at runtime. Perf-Sat is integrated into the hardware work scheduler and guides it to either increase or decrease the number of active threads. We evaluate the performance impact of our scheduler on two GPU generations and show that Perf-Sat scales well to different applications as well as architectures. With performance loss of less than 1%, Perf-Sat is able to achieve core resource savings of 18.32% on average.
Keywords :
graphics processing units; multi-threading; performance evaluation; processor scheduling; GPGPU application; Perf-Sat; active threads; core resource savings; graphic processing units; hardware work scheduler; memory latency tolerance; optimal thread count value; performance evaluation; performance loss; performance saturation; resource utilization; runtime performance saturation detection; thread launching; thread level parallelism; thread scheduler; Computer architecture; Graphics processing units; Hardware; Instruction sets; Kernel; Message systems; Pipelines; GPGPU; Resource Utilization; Workload Scheduling;
Conference_Titel :
Parallel Processing Workshops (ICCPW), 2014 43rd International Conference on
DOI :
10.1109/ICPPW.2014.14