مرکز منطقه ای اطلاع رساني علوم و فناوري - High performing cache hierarchies for server workloads: Relaxing inclusion to capture the latency benefits of exclusive caches

DocumentCode :

1949629

Title :

High performing cache hierarchies for server workloads: Relaxing inclusion to capture the latency benefits of exclusive caches

Author :

Jaleel, Aamer ; Nuzman, Joseph ; Moga, Adrian ; Steely, Simon C. ; Emer, Joel

Author_Institution :

Intel Corp., Hudson, MA, USA

fYear :

2015

fDate :

7-11 Feb. 2015

Firstpage :

343

Lastpage :

353

Abstract :

Increasing transistor density enables adding more on-die cache real-estate However, devoting more space to the shared last-level-cache (LLC) causes the memory latency bottleneck to move from memory access latency to shared cache access latency. As such, applications whose working set is larger than the smaller caches spend a large fraction of their execution time on shared cache access latency. To address this problem, this paper investigates increasing the size of smaller private caches in the hierarchy as opposed to increasing the shared LLC. Doing so improves average cache access latency for workloads whose working set fits into the larger private cache while retaining the benefits of a shared LLC. The consequence of increasing the size of private caches is to relax inclusion and build exclusive hierarchies. Thus, for the same total caching capacity, an exclusive cache hierarchy provides better cache access latency. We observe that server workloads benefit tremendously from an exclusive hierarchy with large private caches. This is primarily because large private caches accommodate the large code working-sets of server workloads. For a 16-core CMP, an exclusive cache hierarchy improves server workload performance by 5-12% as compared to an equal capacity inclusive cache hierarchy. The paper also presents directions for further research to maximize performance of exclusive cache hierarchies.

Keywords :

cache storage; storage management; 16-core CMP; caching capacity; capacity inclusive cache hierarchy; code working sets; exclusive cache hierarchies; large private caches; latency benefits; memory access latency; memory latency bottleneck; relax inclusion; server workloads; shared LLC; shared cache access latency; shared last level cache; transistor density; Manufacturing; Multicore processing; Multiprocessor interconnection; Prefetching; Sensitivity; Servers; System-on-chip; cache replacement; commercial workloads; exclusive; inclusive; server cache hierarchy;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

High Performance Computer Architecture (HPCA), 2015 IEEE 21st International Symposium on

Conference_Location :

Burlingame, CA

Type :

conf

DOI :

10.1109/HPCA.2015.7056045

Filename :

7056045

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1949629