مرکز منطقه ای اطلاع رساني علوم و فناوري - PageNUCA: Selected policies for page-grain locality management in large shared chip-multiprocessor caches

DocumentCode :

2950009

Title :

PageNUCA: Selected policies for page-grain locality management in large shared chip-multiprocessor caches

Author :

Chaudhuri, Mainak

Author_Institution :

Dept. of Comput. Sci. & Eng., Indian Inst. of Technol., Kanpur

fYear :

2009

fDate :

14-18 Feb. 2009

Firstpage :

227

Lastpage :

238

Abstract :

As the last-level on-chip caches in chip-multiprocessors increase in size, the physical locality of on-chip data becomes important for delivering high performance. The non-uniform access latency seen by a core to different independent banks of a large cache spread over the chip necessitates active mechanisms for improving data locality. The central proposal of this paper is a fully hardwired coarse-grain data migration mechanism that dynamically monitors the access patterns of the cores at the granularity of a page to reduce the book-keeping overhead and decides when and where to migrate an entire page of data to amortize the performance overhead. The page-grain migration mechanism is compared against two variants of previously proposed cache block-grain dynamic migration mechanisms and two OS-assisted static locality management mechanisms. Our detailed execution-driven simulation of an eight-core chip-multiprocessor with a shared 16 MB L2 cache employing a bidirectional ring to connect the cores and the L2 cache banks shows that hardwired dynamic page migration, while using only 4.8% of extra storage out of the total L2 cache and book-keeping budget, delivers the best performance and energy-efficiency across a set of shared memory parallel applications selected from the SPLASH-2, SPEC OMP, DARPA DIS, and FFTW suites and multiprogrammed workloads prepared out of the SPEC 2000 and BioBench suites. It reduces execution time by 18.7% and 12.6% on average (geometric mean) respectively for the shared memory applications and the multiprogrammed workloads compared to a baseline architecture that distributes the pages round-robin across the L2 cache banks.

Keywords :

memory architecture; microprocessor chips; multiprocessing systems; storage management chips; L2 cache banks; PageNUCA; cache block-grain dynamic migration; data locality; hardwired coarse-grain data migration; hardwired dynamic page migration; non-uniform access latency; on-chip caches; on-chip data; page-grain locality management; page-grain migration; shared chip-multiprocessor caches; shared memory parallel application; Cache storage; Computer science; Data engineering; Delay; Energy storage; Engineering management; Proposals; SDRAM; Switches; Technology management;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

High Performance Computer Architecture, 2009. HPCA 2009. IEEE 15th International Symposium on

Conference_Location :

Raleigh, NC

ISSN :

1530-0897

Print_ISBN :

978-1-4244-2932-5

Type :

conf

DOI :

10.1109/HPCA.2009.4798258

Filename :

4798258

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2950009