Title :
Thread affinity mapping for irregular data access on shared Cache GPGPU
Author :
Kuo, Hsien-Kai ; Chen, Kuan-Ting ; Lai, Bo-Cheng Charles ; Jou, Jing-Yang
Author_Institution :
Dept. of Electron. Eng., Nat. Chiao Tung Univ., Hsinchu, Taiwan
fDate :
Jan. 30 2012-Feb. 2 2012
Abstract :
Memory Coalescing and on-chip shared Cache are two effective techniques to alleviate the memory bottleneck in modern GPGPUs. These two techniques are very useful on applications with regular memory accesses. However, they become ineffective on concurrent threads with large numbers of uncoordinated accesses and the potential performance benefit could be significantly degraded. This paper proposes a thread affinity mapping methodology to coordinate the irregular data accesses on shared cache GPGPUs. Based on the proposed affinity metrics, threads are congregated into execution groups which are able to fully exploit the memory coalescing and data sharing within an application. An average of 3.5x runtime speedup is achieved on a Fermi GPGPU. The speedup scales with the sizes of test cases, which makes the proposed methodology an effective and promising solution for the continually increasing complexities of applications in the future many-core systems.
Keywords :
cache storage; general purpose computers; graphics processing units; shared memory systems; Fermi GPGPU; affinity metrics; concurrent threads; data sharing; general-purpose computing-on-graphics processing units; irregular data access; many-core systems; memory bottleneck; memory coalescing; on-chip shared cache; regular memory accesses; shared cache GPGPU; thread affinity mapping methodology; Equations; Instruction sets; Logic gates; Measurement; Memory management; Message systems; Runtime;
Conference_Titel :
Design Automation Conference (ASP-DAC), 2012 17th Asia and South Pacific
Conference_Location :
Sydney, NSW
Print_ISBN :
978-1-4673-0770-3
DOI :
10.1109/ASPDAC.2012.6165038