DocumentCode :
119521
Title :
iCHAT: Inter-cache Hardware-Assistant Data Transfer for Heterogeneous Chip Multiprocessors
Author :
Junli Gu ; Beckmann, Bradford M. ; Ting Cao ; Yu Hu
fYear :
2014
fDate :
6-8 Aug. 2014
Firstpage :
242
Lastpage :
251
Abstract :
Modern heterogeneous multiprocessors integrate CPU and GPU together to provide a boost to computational performance. Data sharing and communication between CPU and GPU has been a critical issue for the final speedup. With tighter integration of CPU and GPU, it has the advantage of sharing and moving data more efficiently in order to leverage the computational power that a GPU can provide. Initially, DMA or PCIe devices were used to transfer data between CPU and GPU with low efficiency and little flexibility. Recently a single address space and coherent cache hierarchies are being adopted in heterogeneous architectures to share data more efficiently. Thus it poses new challenge to understand the communication overheads in this new context and to improve communication efficiencies for these architectures. This paper proposes a novel approach called iCHAT (inter-Cache Hardware-Assistant data Transfer) to manage data transfer between the CPU cache and the GPU cache efficiently. The iCHAT technique proposed in this paper detects the communication patterns and eagerly evicts data from the owner´s caches and prepares for the requestor´s demand. We implement the iCHAT design in a simulator based on gem5 and an AMD in-house GPU simulator. Experimental results show that the communication related eviction traffic is reduced by an average of 40% and the total directory traffic is reduced by 8% on average. We implement a bounding experiment that provides a quantitative evaluation of inter CPU-GPU transfers and requests to communication data, which indicates that iCHAT could achieve on average 1.4x speedup for Rodinia benchmark suite and 1.2x speedup for AMD SDK APPs.
Keywords :
cache storage; graphics processing units; multiprocessing systems; AMD in-house GPU simulator; CPU; GPU; address space; cache hierarchy; central processing unit; communication efficiency; communication related eviction traffic; computational performance; data sharing; gem5; graphics processing unit; heterogeneous chip multiprocessors; iCHAT technique; inter-cache hardware-assistant data transfer; Benchmark testing; Coherence; Computer architecture; Data transfer; Detectors; Graphics processing units; Hardware;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Networking, Architecture, and Storage (NAS), 2014 9th IEEE International Conference on
Conference_Location :
Tianjin
Type :
conf
DOI :
10.1109/NAS.2014.43
Filename :
6923186
Link To Document :
بازگشت