DocumentCode :
2787992
Title :
Coherent Profiles: Enabling Efficient Reuse Distance Analysis of Multicore Scaling for Loop-based Parallel Programs
Author :
Wu, Meng-Ju ; Yeung, Donald
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of Maryland at Coll. Park, College Park, MD, USA
fYear :
2011
fDate :
10-14 Oct. 2011
Firstpage :
264
Lastpage :
275
Abstract :
Reuse distance (RD) analysis is a powerful memory analysis tool that can potentially help architects study multicore processor scaling. One key obstacle though is multicore RD analysis requires measuring concurrent reuse distance (CRD) profiles across thread-interleaved memory reference streams. Sensitivity to memory interleaving makes CRD profiles architecture dependent, preventing them from analyzing different processor configurations. For loop-based parallel programs, CRD profiles shift coherently to larger CRD values with core count scaling because interleaving threads are symmetric. Simple techniques can predict such shifting, making the analysis of numerous multicore configurations from a small set of CRD profiles feasible. Given the ubiquity and scalability of loop-level parallelism, such techniques will be extremely valuable for studying future large multicore designs. This paper investigates using RD analysis to efficiently analyze multicore cache performance for loop-based parallel programs, making several contributions. First, we provide in depth analysis on how CRD profiles change with core count scaling. Second, we develop techniques to predict CRD profile scaling, in particular employing reference groups to predict coherent shift, and evaluate prediction accuracy. Third, we show core count scaling only degrades performance for last level caches (LLCs) below 16MB for our benchmarks and problem sizes, increasing to 64 - 128MB if problem size scales by 64x. Finally, we apply CRD profiles to analyze multicore cache performance. When combined with existing problem scaling prediction, our techniques can predict LLC MPKI to within 11.1% of simulation across 1,728 configurations using only 36 measured CRD profiles.
Keywords :
cache storage; multi-threading; multiprocessing systems; CRD profile scaling; coherent profile; concurrent reuse distance; last level caches; loop-based parallel programs; loop-level parallelism; memory analysis tool; memory interleaving; multicore RD analysis; multicore cache performance; multicore configuration; multicore design; multicore processor scaling; multicore scaling; processor configuration; reuse distance analysis; scaling prediction; sensitivity; thread-interleaved memory reference streams; Accuracy; Benchmark testing; Instruction sets; Memory management; Multicore processing; Parallel processing; Shape;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Architectures and Compilation Techniques (PACT), 2011 International Conference on
Conference_Location :
Galveston, TX
ISSN :
1089-795X
Print_ISBN :
978-1-4577-1794-9
Type :
conf
DOI :
10.1109/PACT.2011.58
Filename :
6113835
Link To Document :
بازگشت