مرکز منطقه ای اطلاع رساني علوم و فناوري - Adaptive Cache and Concurrency Allocation on GPGPUs

DocumentCode :

3600662

Title :

Adaptive Cache and Concurrency Allocation on GPGPUs

Author :

Zhong Zheng ; Zhiying Wang ; Lipasti, Mikko

Author_Institution :

State Key Lab. of High Performance Comput. & Sch. of Comput., Nat. Univ. of Defense Technol., Changsha, China

Volume :

Issue :

fYear :

2015

Firstpage :

Lastpage :

Abstract :

Memory bandwidth is critical to GPGPU performance. Exploiting locality in caches can better utilize memory bandwidth. However, memory requests issued by excessive threads cause cache thrashing and saturate memory bandwidth, degrading performance. In this paper, we propose adaptive cache and concurrency allocation (CCA) to prevent cache thrashing and improve the utilization of bandwidth and computational resources, hence improving performance. According to locality and reuse distance of access patterns in GPGPU program, warps on a stream multiprocessor are dynamically divided into three groups: cached, bypassed, and waiting. The data cache accommodates the footprint of cached warps. Bypassed warps cannot allocate cache lines in the data cache to prevent cache thrashing, but are able to take advantage of available memory bandwidth and computational resource. Waiting warps are de-scheduled. Experimental results show that adaptive CCA can significant improve benchmark performance, with 80 percent harmonic mean IPC improvement over the baseline.

Keywords :

cache storage; concurrency control; graphics processing units; multi-threading; multiprocessing systems; performance evaluation; CCA; GPGPU performance improvement; access patterns; adaptive cache-and-concurrency allocation; bandwidth utilization improvement; benchmark performance improvement; bypassed warps; cache lines; cache locality; cache thrashing prevention; cached warps; computational resource utilization improvement; harmonic mean IPC improvement; memory bandwidth saturation; reuse distance; stream multiprocessor; waiting warp descheduling; Bandwidth; Benchmark testing; Cache memory; Concurrent computing; Instruction sets; Resource management; GPGPU; cache; concurrency;

fLanguage :

English

Journal_Title :

Computer Architecture Letters

Publisher :

ieee

ISSN :

1556-6056

Type :

jour

DOI :

10.1109/LCA.2014.2359882

Filename :

6907961

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3600662