مرکز منطقه ای اطلاع رساني علوم و فناوري - Avoiding cache thrashing due to private data placement in last-level cache for manycore scaling

DocumentCode :

3469151

Title :

Avoiding cache thrashing due to private data placement in last-level cache for manycore scaling

Author :

Meng, Jiayuan ; Skadron, Kevin

Author_Institution :

Dept. of Comput. Sci., Univ. of Virginia, Charlottesville, VA, USA

fYear :

2009

fDate :

4-7 Oct. 2009

Firstpage :

282

Lastpage :

288

Abstract :

Without high-bandwidth broadcast, large numbers of cores require a scalable point-to-point interconnect and a directory protocol. In such cases, a shared, inclusive last level cache (LLC) can improve data sharing and avoid three-way communication for shared reads. However, if inclusion encompasses thread-private data, two problems arise with the shared LLC. First, current memory allocators align stack bases on page boundaries, which emerges as a source of severe conflict misses for large numbers of threads on data-parallel applications. Second, correctness does not require the private data to reside in the shared directory or the LLC. This paper advocates stack-base randomization that eliminates the major source of conflict misses for large numbers of threads. However, when capacity becomes a limitation for the directory or last-level cache, this is not sufficient. We then propose non-inclusive, semi-coherent cache organization (NISC) that removes the requirement for inclusion of private data and reduces capacity misses. Our data-parallel benchmarks show that these limitations prevent scaling beyond 8 cores, while our techniques allow scaling to at least 32 cores for most benchmarks. At 8 cores, stack randomization provides a mean speedup of 1.2X, but stack randomization with 32 cores gives a speedup of 2.7X over the best baseline configuration. Comparing to conventional performance with a 2 MB LLC, our technique achieves similar performance with a 256 KB LLC, suggesting LLCs may be typically overprovisioned. When very limited LLC resources are available, NISC can further improve system performance by 1.8X.

Keywords :

cache storage; multiprocessing systems; parallel processing; cache thrashing; data parallel application; data sharing; directory protocol; last level cache; manycore scaling; memory allocators; non-inclusive semi-coherent cache organization; private data placement; scalable point-to-point interconnect; shared directory; stack base randomization; thread-private data; Broadcasting; Computer science; Hardware; Large-scale systems; Multithreading; Protocols; Sun; System performance; Throughput; Yarn;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Computer Design, 2009. ICCD 2009. IEEE International Conference on

Conference_Location :

Lake Tahoe, CA

ISSN :

1063-6404

Print_ISBN :

978-1-4244-5029-9

Electronic_ISBN :

1063-6404

Type :

conf

DOI :

10.1109/ICCD.2009.5413143

Filename :

5413143

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3469151