Title :
High performing cache hierarchies for server workloads: Relaxing inclusion to capture the latency benefits of exclusive caches
Author :
Jaleel, Aamer ; Nuzman, Joseph ; Moga, Adrian ; Steely, Simon C. ; Emer, Joel
Author_Institution :
Intel Corp., Hudson, MA, USA
Abstract :
Increasing transistor density enables adding more on-die cache real-estate However, devoting more space to the shared last-level-cache (LLC) causes the memory latency bottleneck to move from memory access latency to shared cache access latency. As such, applications whose working set is larger than the smaller caches spend a large fraction of their execution time on shared cache access latency. To address this problem, this paper investigates increasing the size of smaller private caches in the hierarchy as opposed to increasing the shared LLC. Doing so improves average cache access latency for workloads whose working set fits into the larger private cache while retaining the benefits of a shared LLC. The consequence of increasing the size of private caches is to relax inclusion and build exclusive hierarchies. Thus, for the same total caching capacity, an exclusive cache hierarchy provides better cache access latency. We observe that server workloads benefit tremendously from an exclusive hierarchy with large private caches. This is primarily because large private caches accommodate the large code working-sets of server workloads. For a 16-core CMP, an exclusive cache hierarchy improves server workload performance by 5-12% as compared to an equal capacity inclusive cache hierarchy. The paper also presents directions for further research to maximize performance of exclusive cache hierarchies.
Keywords :
cache storage; storage management; 16-core CMP; caching capacity; capacity inclusive cache hierarchy; code working sets; exclusive cache hierarchies; large private caches; latency benefits; memory access latency; memory latency bottleneck; relax inclusion; server workloads; shared LLC; shared cache access latency; shared last level cache; transistor density; Manufacturing; Multicore processing; Multiprocessor interconnection; Prefetching; Sensitivity; Servers; System-on-chip; cache replacement; commercial workloads; exclusive; inclusive; server cache hierarchy;
Conference_Titel :
High Performance Computer Architecture (HPCA), 2015 IEEE 21st International Symposium on
Conference_Location :
Burlingame, CA
DOI :
10.1109/HPCA.2015.7056045