DocumentCode :
3571468
Title :
Thorough Evaluation of GPU Shared Memory Load and Store Instructions
Author :
Okamoto, Satoshi ; Ito, Yasuaki ; Nakano, Koji ; Bordim, Jacir L.
Author_Institution :
Dept. of Inf. Eng., Hiroshima Univ., Higashi-Hiroshima, Japan
fYear :
2014
Firstpage :
614
Lastpage :
616
Abstract :
This work focuses on measuring the number of GPU clock cycles necessary to execute load/store instructions in both bank conflict and bank conflict-free shared memory access patterns. To this end, a varying number of parameters have been considered in the experiments, including the number of warps (w), the number of memory bank conflicts (k) as well as the number of load/store instructions (l) per warp. From the analysis of the experimental results, it was possible to obtain an estimate (E) on the number of the clock cycles necessary to execute l load/store instructions. The estimate is given by E = w · l · k · c1 + c2, where c1 and c2 are constants assuming values 1.047 and 337.7, respectively. From the above results, we believe that obtained estimated can be used as an approximation on the number of clock cycles necessary to execute load and store instructions.
Keywords :
graphics processing units; instruction sets; shared memory systems; GPU clock cycle; GPU shared memory load; GPU store instruction; bank conflict-free shared memory access pattern; Assembly; Clocks; Graphics processing units; Instruction sets; Memory management; Message systems; Synchronization; GPU; bank conflict; clock cycle measurement; shared memory;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computing and Networking (CANDAR), 2014 Second International Symposium on
Type :
conf
DOI :
10.1109/CANDAR.2014.42
Filename :
7052260
Link To Document :
بازگشت