Title :
DRAM access reduction in GPUs by thread-block scheduling for overlapped data reuse
Author :
Seungyeol Lee ; Wonyong Sung
Author_Institution :
Dept. of Electr. Eng., Seoul Nat. Univ., Seoul, South Korea
Abstract :
General Purpose Graphics Processing Units (GPG-PUs) show very high throughput when executing parallel programs. However, they usually demand very large DRAM bandwidth and consume much power for memory access. Although recent high performance GPGPUs equip L2 cache to absorb some of DRAM accesses, the cache hit ratio can hardly be very high because of the limited cache size. We propose a GPU thread-block scheduling method that can better utilize L2 cache and reduce the DRAM memory access. This scheduling method exploits the inter-block locality in the scheduling of GPU thread-blocks. This method can easily be implemented by modifying application programs. This technique is applied to the Hotspot benchmark programs, and reduces the DRAM access by up to 39%.
Keywords :
DRAM chips; cache storage; graphics processing units; scheduling; DRAM access reduction; DRAM bandwidth; DRAM memory access; GPU; Hotspot benchmark programs; L2 cache; application programs; cache hit ratio; cache size; general purpose graphics processing units; inter-block locality; overlapped data reuse; parallel programs; thread-block scheduling; Cache memory; Computer architecture; Graphics processing units; Instruction sets; Message systems; Random access memory; Strips;
Conference_Titel :
Circuits and Systems (ISCAS), 2013 IEEE International Symposium on
Conference_Location :
Beijing
Print_ISBN :
978-1-4673-5760-9
DOI :
10.1109/ISCAS.2013.6571993