Title :
Loop Optimization for Divergence Reduction on GPUs with SIMT Architecture
Author_Institution :
Dept. of Commun. Syst., Jozef Stefan Inst., Ljubljana, Slovenia
Abstract :
The single-instruction multiple thread (SIMT) architecture that can be found in some latest graphical processing units (GPUs) builds on the conventional single-instruction multiple data (SIMD) parallelism while adopting the thread programming model. The architecture suffers from a degraded performance caused by the inefficient divergence handling, a problem hidden by the programmer´s view of independent threads. A loop optimization technique having the potential to increase efficiency of the core SIMD block while processing embedded divergences is investigated here. Concurrent loops are generally not bound to iterate in lock-step, allowing better alignment of thread flows via iteration scheduling. The concept efficiency is analyzed for fixed and flow-adapting scheduling policies. The proposed payoff model captures loop overhead implications, allowing one to assess the tradeoffs of applying the technique to a specific loop instance. Processing speedups can generally be observed in the total running time if kernels are compute-bound, as demonstrated by several examples. The studied iteration scheduling policies do not impose alterations to the core SIMD concept and design, thus preserving the benefits of data level parallelism.
Keywords :
graphics processing units; multi-threading; parallel processing; processor scheduling; GPUs; SIMD parallelism; SIMT architecture; concurrent loops; data level parallelism; divergence reduction; embedded divergence processing; fixed scheduling policy; flow-adapting scheduling policy; graphical processing units; iteration scheduling policy; loop optimization technique; single-instruction multiple data parallelism; single-instruction multiple thread architecture; thread programming model; Computer architecture; Dynamic scheduling; Graphics processing units; Instruction sets; Optimal scheduling; Schedules; Concurrent programming; SIMT; efficiency analysis; iteration scheduling; multithreaded processors; optimization;
Journal_Title :
Parallel and Distributed Systems, IEEE Transactions on
DOI :
10.1109/TPDS.2014.2324587