DocumentCode :
186348
Title :
Characterization and analysis of dynamic parallelism in unstructured GPU applications
Author :
Jin Wang ; Yalamanchili, Sudhakar
Author_Institution :
Georgia Inst. of Technol., Atlanta, GA, USA
fYear :
2014
fDate :
26-28 Oct. 2014
Firstpage :
51
Lastpage :
60
Abstract :
GPUs have been proven very effective for structured applications. However, emerging data intensive applications are increasingly unstructured - irregular in their memory and control flow behavior over massive data sets. While the irregularity in these applications can result in poor workload balance among fine-grained threads or coarse-grained blocks, one can still observe dynamically formed pockets of structured data parallelism that can locally effectively exploit the GPU compute and memory bandwidth. In this study, we seek to characterize such dynamically formed parallelism and and evaluate implementations designed to exploit them using CUDA Dynamic Parallelism (CDP) - an execution model where parallel workload are launched dynamically from within kernels when pockets of structured parallelism are detected. We characterize and evaluate such implementations by analyzing the impact on control and memory behavior measurements on commodity hardware. In particular, the study targets a comprehensive understanding of the overhead of current CDP support in GPUs in terms of kernel launch, memory footprint and algorithm overhead. Experiments show that while the CDP implementation can generate potentially 1.13x-2.73x speedup over non-CDP implementations, the non-trivial overhead causes the overall performance an average of 1.21x slowdown.
Keywords :
graphics processing units; resource allocation; storage management; CUDA dynamic parallelism; coarse-grained blocks; control flow behavior; dynamic parallelism analysis; dynamic parallelism characterization; dynamically formed parallelism; dynamically formed pockets; emerging data intensive applications; fine-grained threads; massive data sets; memory behavior; structured applications; structured data parallelism; unstructured GPU applications; workload balance; Arrays; Graphics processing units; Instruction sets; Kernel; Measurement; Parallel processing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Workload Characterization (IISWC), 2014 IEEE International Symposium on
Conference_Location :
Raleigh, NC
Print_ISBN :
978-1-4799-6452-9
Type :
conf
DOI :
10.1109/IISWC.2014.6983039
Filename :
6983039
Link To Document :
بازگشت