Title :
Characterizing scalar opportunities in GPGPU applications
Author :
Zhongliang Chen ; Kaeli, David ; Rubin, Norman
Author_Institution :
Dept. of Electr. & Comput. Eng., Northeastern Univ., Boston, MA, USA
Abstract :
General Purpose computing with Graphics Processing Units (GPGPU) has gained widespread adoption in both the high performance and general purpose communities. In most GPU computation, execution exploits a Single Instruction Multiple Data (SIMD) model. However, GPU execution typically pays little attention to whether the data operated upon by the SIMD units is the same or different. When SIMD computation operates on multiple copies of the same data, redundant computations are generated. It provides an opportunity to improve efficiency by just broadcasting the results of a single computation to multiple outputs. To better serve those operations, modern GPUs are armed with scalar units. Then SIMD instructions that are operating on the same input data operands will be directed to execute upon scalar units, requiring only a single copy of the data, and leaving the data-parallel SIMD units available to execute non-scalar operations. In this paper, we first characterize a number of CUDA programs taken from the NVIDIA SDK to quantify the potential for scalar execution. We observe that 38% of static SIMD instructions are recognized to operate on the same data by the compiler, and their dynamic occurences account for 34% of the total dynamic instruction execution. We then evaluate the impact of scalar units on a heterogeneous scalar-vector GPU architecture. Our results show that scalar units are utilized 51% of the time during execution, though their use places additional pressure on the interconnect and memory, as shown in the results of our study.
Keywords :
general purpose computers; graphics processing units; parallel architectures; CUDA programs; GPGPU applications; GPU computation; GPU execution; NVIDIA SDK; SIMD computation; SIMD model; characterizing scalar opportunity; compiler; data-parallel SIMD units; dynamic instruction execution; dynamic occurences; general purpose community; general purpose computing; graphics processing units; heterogeneous scalar-vector GPU architecture; input data operands; multiple outputs; non-scalar operations; redundant computations; scalar execution; scalar units; single instruction multiple data model; static SIMD instructions; Computational modeling; Computer architecture; Graphics processing units; Instruction sets; Niobium; Registers; Vectors;
Conference_Titel :
Performance Analysis of Systems and Software (ISPASS), 2013 IEEE International Symposium on
Conference_Location :
Austin, TX
Print_ISBN :
978-1-4673-5776-0
Electronic_ISBN :
978-1-4673-5778-4
DOI :
10.1109/ISPASS.2013.6557173