Title :
Cooperative Caching for Chip Multiprocessors
Author :
Chang, Jichuan ; Sohi, Gurindar S.
Author_Institution :
Dept. of Comput. Sci., Wisconsin Univ., Madison, WI
Abstract :
This paper presents CMP cooperative caching, a unified framework to manage a CMP\´s aggregate on-chip cache resources. Cooperative caching combines the strengths of private and shared cache organizations by forming an aggregate "shared" cache through cooperation among private caches. Locally active data are attracted to the private caches by their accessing processors to reduce remote on-chip references, while globally active data are cooperatively identified and kept in the aggregate cache to reduce off-chip accesses. Examples of cooperation include cache-to-cache transfers of clean data, replication-aware data replacement, and global replacement of inactive data. These policies can be implemented by modifying an existing cache replacement policy and cache coherence protocol, or by the new implementation of a directory-based protocol presented in this paper. Our evaluation using full-system simulation shows that cooperative caching achieves an off-chip miss rate similar to that of a shared cache, and a local cache hit rate similar to that of using private caches. Cooperative caching performs robustly over a range of system/cache sizes and memory latencies. For an 8-core CMP with 1MB L2 cache per core, the best cooperative caching scheme improves the performance of multithreaded commercial workloads by 5-11% compared with a shared cache and 4-38% compared with private caches. For a 4-core CMP running multiprogrammed SPEC2000 workloads, cooperative caching is on average 11% and 6% faster than shared and private cache organizations, respectively
Keywords :
cache storage; microprocessor chips; multi-threading; multiprocessing systems; protocols; CMP cooperative caching; accessing processors; cache coherence protocol; cache organizations; cache replacement policy; cache-to-cache transfers; chip multiprocessors; directory-based protocol; memory latency; multiprogrammed SPEC2000 workloads; multithreaded commercial workloads; off-chip miss rate; on-chip cache resources; on-chip references; private caches; replication-aware data replacement; Access protocols; Added delay; Aggregates; Cooperative caching; Costs; Proposals; Resource management; Robustness; Wire;
Conference_Titel :
Computer Architecture, 2006. ISCA '06. 33rd International Symposium on
Conference_Location :
Boston, MA
Print_ISBN :
0-7695-2608-X
DOI :
10.1109/ISCA.2006.17