مرکز منطقه ای اطلاع رساني علوم و فناوري - CPU-assisted GPGPU on fused CPU-GPU architectures

DocumentCode :

3539864

Title :

CPU-assisted GPGPU on fused CPU-GPU architectures

Author :

Yang, Yi ; Xiang, Ping ; Mantor, Mike ; Zhou, Huiyang

Author_Institution :

Dept. of Electr. & Comput. Eng., North Carolina State Univ., Raleigh, NC, USA

fYear :

2012

fDate :

25-29 Feb. 2012

Firstpage :

Lastpage :

Abstract :

This paper presents a novel approach to utilize the CPU resource to facilitate the execution of GPGPU programs on fused CPU-GPU architectures. In our model of fused architectures, the GPU and the CPU are integrated on the same die and share the on-chip L3 cache and off-chip memory, similar to the latest Intel Sandy Bridge and AMD accelerated processing unit (APU) platforms. In our proposed CPU-assisted GPGPU, after the CPU launches a GPU program, it executes a pre-execution program, which is generated automatically from the GPU kernel using our proposed compiler algorithms and contains memory access instructions of the GPU kernel for multiple thread-blocks. The CPU pre-execution program runs ahead of GPU threads because (1) the CPU pre-execution thread only contains memory fetch instructions from GPU kernels and not floating-point computations, and (2) the CPU runs at higher frequencies and exploits higher degrees of instruction-level parallelism than GPU scalar cores. We also leverage the prefetcher at the L2-cache on the CPU side to increase the memory traffic from CPU. As a result, the memory accesses of GPU threads hit in the L3 cache and their latency can be drastically reduced. Since our pre-execution is directly controlled by user-level applications, it enjoys both high accuracy and flexibility. Our experiments on a set of benchmarks show that our proposed pre-execution improves the performance by up to 113% and 21.4% on average.

Keywords :

cache storage; graphics processing units; program compilers; AMD accelerated processing unit platforms; CPU preexecution program; CPU resource utilization; CPU-GPU architecture fusion; CPU-assisted GPGPU; GPGPU program execution; GPU kernel; Intel Sandy Bridge; L2-cache; compiler algorithms; instruction-level parallelism; memory access instructions; memory fetch instructions; multiple thread-blocks; off-chip memory; on-chip L3 cache; prefetcher; user-level applications; Central Processing Unit; Computer architecture; Graphics processing unit; Kernel; Prefetching; System-on-a-chip;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

High Performance Computer Architecture (HPCA), 2012 IEEE 18th International Symposium on

Conference_Location :

New Orleans, LA

ISSN :

1530-0897

Print_ISBN :

978-1-4673-0827-4

Electronic_ISBN :

1530-0897

Type :

conf

DOI :

10.1109/HPCA.2012.6168948

Filename :

6168948

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3539864