DocumentCode :
174640
Title :
A Thread-Aware Adaptive Data Prefetcher
Author :
Jiyang Yu ; Peng Liu
Author_Institution :
Dept. of Inf. Sci. & Electron. Eng., Zhejiang Univ., Hangzhou, China
fYear :
2014
fDate :
19-22 Oct. 2014
Firstpage :
278
Lastpage :
285
Abstract :
Most processors employ hardware data prefetching to hide memory access latencies. However the prefetching requests from different threads on a multi-core processor can cause severe interference with prefetching and/or demand requests of others. The data prefetching can lead to significant performance degradation due to shared resource contention on shared memory multi-core systems. This paper proposes a thread-aware data prefetching mechanism based on low-overhead run-time information to tune prefetching modes and aggressiveness, mitigating the resource contention in the memory system. Our solution has two new components: 1) a filtering mechanism that informs the hardware about which prefetching requests can cause shared data invalidation and should be discarded, and 2) a self-tuning prefetcher that uses run-time feedback to adjust each thread´s data prefetching mode and arguments. On a set of parallel benchmarks, our thread-aware data prefetching mechanisms improve the overall performance of 64-core system by 11% and reduce the energy-delay product by 13% over a multi-mode prefetch baseline system with a two level cache organization and a conventional MESI-based directory coherence protocol. We compare our approach to the feedback directed prefetching (FDP) technique and find that it provides better performance on multi-core systems, while reducing the energy delay product.
Keywords :
cache storage; feedback; multi-threading; resource allocation; shared memory systems; FDP technique; MESI-based directory coherence protocol; energy-delay product; feedback directed prefetching technique; filtering mechanism; hardware data prefetching; low-overhead run-time information; memory access latency; multicore processor; multimode prefetch baseline system; parallel benchmark; performance degradation; prefetching aggressiveness tuning; prefetching mode tuning; prefetching request; resource contention mitigation; run-time feedback; self-tuning prefetcher; shared data invalidation; shared memory multicore systems; shared resource contention; thread-aware adaptive data prefetcher; thread-aware data prefetching mechanism; two level cache organization; Accuracy; Engines; Hardware; Measurement; Prefetching;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Design (ICCD), 2014 32nd IEEE International Conference on
Conference_Location :
Seoul
Type :
conf
DOI :
10.1109/ICCD.2014.6974694
Filename :
6974694
Link To Document :
بازگشت