Title :
Locality-aware data replication in the Last-Level Cache
Author :
Kurian, George ; Devadas, Srinivas ; Khan, Omar
Author_Institution :
Massachusetts Inst. of Technol., Cambridge, MA, USA
Abstract :
Next generation multicores will process massive data with varying degree of locality. Harnessing on-chip data locality to optimize the utilization of cache and network resources is of fundamental importance. We propose a locality-aware selective data replication protocol for the last-level cache (LLC). Our goal is to lower memory access latency and energy by replicating only high locality cache lines in the LLC slice of the requesting core, while simultaneously keeping the off-chip miss rate low. Our approach relies on low overhead yet highly accurate in-hardware run-time classification of data locality at the cache line granularity, and only allows replication for cache lines with high reuse. Furthermore, our classifier captures the LLC pressure at the existing replica locations and adapts its replication decision accordingly. The locality tracking mechanism is decoupled from the sharer tracking structures that cause scalability concerns in traditional coherence protocols. Moreover, the complexity of our protocol is low since no additional coherence states are created. On a set of parallel benchmarks, our protocol reduces the overall energy by 16%, 14%, 13% and 21% and the completion time by 4%, 9%, 6% and 13% when compared to the previously proposed Victim Replication, Adaptive Selective Replication, Reactive-NUCA and Static-NUCA LLC management schemes.
Keywords :
cache storage; pattern classification; resource allocation; storage management; tracking; LLC; cache line granularity; cache line replication; cache utilization optimization; coherence protocols; in-hardware run-time classification; last-level cache; locality cache lines; locality tracking mechanism; locality-aware data replication; locality-aware selective data replication protocol; memory access latency; network resource utilization optimization; next generation multicores; on-chip data locality; parallel benchmarks; protocol complexity; scalability concerns; sharer tracking structures; Coherence; Complexity theory; Multicore processing; Organizations; Protocols; Radiation detectors; System-on-chip;
Conference_Titel :
High Performance Computer Architecture (HPCA), 2014 IEEE 20th International Symposium on
Conference_Location :
Orlando, FL
DOI :
10.1109/HPCA.2014.6835921