Title :
PageNUCA: Selected policies for page-grain locality management in large shared chip-multiprocessor caches
Author :
Chaudhuri, Mainak
Author_Institution :
Dept. of Comput. Sci. & Eng., Indian Inst. of Technol., Kanpur
Abstract :
As the last-level on-chip caches in chip-multiprocessors increase in size, the physical locality of on-chip data becomes important for delivering high performance. The non-uniform access latency seen by a core to different independent banks of a large cache spread over the chip necessitates active mechanisms for improving data locality. The central proposal of this paper is a fully hardwired coarse-grain data migration mechanism that dynamically monitors the access patterns of the cores at the granularity of a page to reduce the book-keeping overhead and decides when and where to migrate an entire page of data to amortize the performance overhead. The page-grain migration mechanism is compared against two variants of previously proposed cache block-grain dynamic migration mechanisms and two OS-assisted static locality management mechanisms. Our detailed execution-driven simulation of an eight-core chip-multiprocessor with a shared 16 MB L2 cache employing a bidirectional ring to connect the cores and the L2 cache banks shows that hardwired dynamic page migration, while using only 4.8% of extra storage out of the total L2 cache and book-keeping budget, delivers the best performance and energy-efficiency across a set of shared memory parallel applications selected from the SPLASH-2, SPEC OMP, DARPA DIS, and FFTW suites and multiprogrammed workloads prepared out of the SPEC 2000 and BioBench suites. It reduces execution time by 18.7% and 12.6% on average (geometric mean) respectively for the shared memory applications and the multiprogrammed workloads compared to a baseline architecture that distributes the pages round-robin across the L2 cache banks.
Keywords :
memory architecture; microprocessor chips; multiprocessing systems; storage management chips; L2 cache banks; PageNUCA; cache block-grain dynamic migration; data locality; hardwired coarse-grain data migration; hardwired dynamic page migration; non-uniform access latency; on-chip caches; on-chip data; page-grain locality management; page-grain migration; shared chip-multiprocessor caches; shared memory parallel application; Cache storage; Computer science; Data engineering; Delay; Energy storage; Engineering management; Proposals; SDRAM; Switches; Technology management;
Conference_Titel :
High Performance Computer Architecture, 2009. HPCA 2009. IEEE 15th International Symposium on
Conference_Location :
Raleigh, NC
Print_ISBN :
978-1-4244-2932-5
DOI :
10.1109/HPCA.2009.4798258