DocumentCode :
602635
Title :
The dual-path execution model for efficient GPU control flow
Author :
Minsoo Rhu ; Erez, M.
Author_Institution :
Electr. & Comput. Eng. Dept., Univ. of Texas at Austin, Austin, TX, USA
fYear :
2013
fDate :
23-27 Feb. 2013
Firstpage :
591
Lastpage :
602
Abstract :
Current graphics processing units (GPUs) utilize the single instruction multiple thread (SIMT) execution model. With SIMT, a group of logical threads executes such that all threads in the group execute a single common instruction on a particular cycle. To enable control flow to diverge within the group of threads, GPUs partially serialize execution and follow a single control flow path at a time. The execution of the threads in the group that are not on the current path is masked. Most current GPUs rely on a hardware reconvergence stack to track the multiple concurrent paths and to choose a single path for execution. Control flow paths are pushed onto the stack when they diverge and are popped off of the stack to enable threads to reconverge and keep lane utilization high. The stack algorithm guarantees optimal reconvergence for applications with structured control flow as it traverses the structured control-flow tree depth first. The downside of using the reconvergence stack is that only a single path is followed, which does not maximize available parallelism, degrading performance in some cases. We propose a change to the stack hardware in which the execution of two different paths can be interleaved. While this is a fundamental change to the stack concept, we show how dual-path execution can be implemented with only modest changes to current hardware and that parallelism is increased without sacrificing optimal (structured) control-flow reconvergence. We perform a detailed evaluation of a set of benchmarks with divergent control flow and demonstrate that the dual-path stack architecture is much more robust compared to previous approaches for increasing path parallelism. Dual-path execution either matches the performance of the baseline single-path stack architecture or outperforms single-path execution by 14.9% on average and by over 30% in some cases.
Keywords :
concurrency control; graphics processing units; multi-threading; performance evaluation; tree searching; GPU control flow; SIMT execution model; control-flow reconvergence stack; dual-path execution; dual-path execution model; dual-path stack architecture; graphics processing units; hardware reconvergence stack; logical threads; optimal reconvergence; path parallelism; performance degradation; single control flow path; single instruction multiple thread execution model; single-path stack architecture; stack algorithm; stack hardware; structured control-flow tree depth first; Computer architecture; Graphics processing units; Hardware; Instruction sets; Microarchitecture; Parallel processing; Robustness;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computer Architecture (HPCA2013), 2013 IEEE 19th International Symposium on
Conference_Location :
Shenzhen
ISSN :
1530-0897
Print_ISBN :
978-1-4673-5585-8
Type :
conf
DOI :
10.1109/HPCA.2013.6522352
Filename :
6522352
Link To Document :
بازگشت