Title :
Understanding How Non-uniform Distribution of Memory Accesses on Cache Sets Affects the System Performance of Chip Multiprocessors
Author :
Jia, Xiaomin ; Jiang, Jiang ; Ni, Xiaoqiang ; Zhao, Tianlei ; Qi, Shubo ; Fu, Guitao ; Zhang, Minxuan
Author_Institution :
Nat. Lab. for Parallel & Distrib. Process., Nat. Univ. of Defense Technol., Changsha, China
Abstract :
Non-uniform distribution of memory accesses across cache sets has been recognized as one of the sources of inefficiency of cache architecture on single-core platform. Several schemes target the problem for performance boost. As chip multiprocessors (CMPs) pick up steam as the mainstream processor design choice, how non-uniform distribution of memory accesses across cache sets affects the cache management of CMPs is becoming an open question. We address the question by presenting several cache management schemes on CMP platforms, aiming at balance the memory access distribution across cache sets on shared caches or private caches. We show that on CMP platforms with multi-programmed workloads: (a) for shared caches, the non-uniform memory access distribution across different cache sets is biased by the fact that multiple applications are running concurrently and sharing the cache capacity. The scheme, which we put forward to make use of the non-uniformity to improve performance on shared caches, is proved to be of little to no benefit or even lead to degradation, (b) for caches that are organized as private caches, direct adaption of a scheme that targets this kind of non-uniformity outperforms the baseline private cache design by 2% on average, (c) however, for a private cache based cache management scheme we proposed, further effort to take advantage of this kind of non-uniformity for performance boost (on top of our proposed scheme) is also proved to be of little to no benefit. Therefore, We draw to the conclusion that on CMP platforms with multiprogrammed workloads, the non-uniform distribution of memory accesses across cache sets is partially circumvented by the interactions between multiple applications. Efforts seeking to make use of the non-uniformity to derive more benefit may end up in vain in CMPs.
Keywords :
cache storage; memory architecture; multiprocessing systems; CMP platform; cache architecture; cache capacity sharing; chip multiprocessor; mainstream processor design; multiprogrammed workload; nonuniform distribution; uniform memory access distribution; Arrays; Benchmark testing; Hidden Markov models; Indexes; Memory management; Protocols; Throughput; chip multiprocessors (CMPs); non-first level cache; non-uniform distribution of memory accesses on cache sets;
Conference_Titel :
Parallel and Distributed Processing with Applications Workshops (ISPAW), 2011 Ninth IEEE International Symposium on
Conference_Location :
Busan
Print_ISBN :
978-1-4577-0524-3
Electronic_ISBN :
978-0-7695-4429-8
DOI :
10.1109/ISPAW.2011.25